Home IT Linux Windows Database Network Programming Server Mobile  
  Home \ Server \ Use Ambari rapid deployment Hadoop big data environment     - Installation Atom text editor on Mint Ubuntu / Linux (Linux)

- Mounting kit under Fedora Linux (Linux)

- Download Manager uGet 2.0 installed in Debian, Ubuntu, Linux Mint and Fedora (Linux)

- CentOS / Linux install VNC Server (Linux)

- Mybatis + binding Struts2: achieving user to insert and find (Programming)

- Ubuntu and Derivative Edition users install LMMS 0.4.15 (Linux)

- Oracle TDE transparent data encryption (Database)

- How to install Git client in Ubuntu (Linux)

- Installation Mate Desktop in FreeBSD 10.1 (Linux)

- Intrusion analysis and prevention tools Knark under Linux platform (Linux)

- Java development specifications summary (Programming)

- Spring Boot + Nginx + Tomcat + SSL configuration notes (Server)

- CentOS 6.5 dual card configuration, one of the external network, a local area network connection (Linux)

- Ubuntu 15.04 installation MATE 1.10 (Linux)

- Vagrant failed to start, stuck in Waiting for VM to boot solution (Linux)

- Elaborate .NET Multithreading: Concepts (Programming)

- ELKstack log analysis platform (Server)

- Lua4.0 interpreter documents (Programming)

- How to install and configure a VNC server on CentOS 7.0 (Server)

- Hadoop vs spark (Server)

  Use Ambari rapid deployment Hadoop big data environment
  Add Date : 2018-11-21      

Big data related to the back-end development work over the past year, with the continuous development of Hadoop community, are constantly trying new things, this article focuses on speaking off his Ambari, the new Apache project, designed to allow you to easily rapid configuration and deployment of Hadoop ecosystem-related components of the environment, and provide maintenance and monitoring functions.

As a novice, I talk about my own learning experience, just beginning to learn, of course, the easiest Google under Hadoop, then download the packages, install a stand-alone in its own virtual machine (CentOS 6.3) is used on the Hadoop version do the test, write a few test class, then do the next test CRUD like, Running Map / Reduce test, of course, this time for Hadoop is not very understanding, constantly look at the article about the overall architecture themselves done is modify a few configuration files under conf, so Hadoop to normal run, this time several modifications in the configuration, after this stage, but also uses HBase, Hadoop ecosystem that another product, of course, modify the configuration, then start-all.sh, start-hbase.sh starting up the service, and then is to modify your program, do the test, as with Hbase learned under Zookeeper Hive and the like, and then after this operation phase after, began to study Hadoop2.0, as a way to Hadoop ecosystem as a whole have some understanding between developing their own in the company undertaken involved in related technologies only on those. but as a hobby to explore whether the people who want to know more about it , its performance how? it is specifically how it works? see large companies that PPT, people (Taobao and other large companies) simply dozens, hundreds, or even thousands of nodes, how people are managed, performance is kind of how? watching those performance tests PPT curve inside, if you can detailed understanding and tuning on their own projects? I seemingly found the answer, and that is Ambari, developed by the HortonWorks a Hadoop-related projects, specifically on the official to understand.
Learn Hadoop ecosystem

Now we often see some of the keywords are: HDFS, MapReduce, HBase, Hive, ZooKeeper, Pig, Sqoop, Oozie, Ganglia, Nagios, CDH3, CDH4, Flume, Scribe, Fluented, HttpFS so, in fact, there should be more more, Hadoop ecosystem development now considered to be fairly prosperous, while those behind the prosperity and who promoted it? read history Hadoop friends may know, Hadoop was first started in Yahoo, but now mainly by HortonWorks and Cloudera this two defenders in the company, most of which belong to two commiter company, so now the market has seen two major versions, CDH series, and community Edition, I first use the community edition, later changed to CDH3, now in exchange for community edition because there Ambari. of course, what and what not, so long as their technology at home, or can be modified to run normal. there is not much to say. talk so much nonsense , began to speak Ambari install it.
Begin deployment

First, understand the next Ambari, project address: http: //incubator.apache.org/ambari/

Installation documentation: http://incubator.apache.org/ambari/1.2.2/installing-hadoop-using-ambari/content/index.html

HortonWorks who wrote an article describes how to install my translation follows: When http://www.linuxidc.com/Linux/2014-05/101530.htm mounted installation documentation please look at it, you must install the documentation serious look, combined with their own version of the system currently used to configure different source, and the time required for the installation process is relatively long, it is necessary to seriously do each step of the installation documentation. Some say I'm here, I met problem.

The following talk about my own installation process.

Machine preparation:

My test environment uses nine HP rotten machines are cloud100 - cloud108, cloud108 as the management node.

Environment path Ambari installation:

Each machine installation directory:

/ Usr / lib / hadoop

/ Usr / lib / hbase

/ Usr / lib / zookeeper

/ Usr / lib / hcatalog

/ Usr / lib / hive

Log path, where the need to see the error information can be found in the log directory

/ Var / log / hadoop

/ Var / log / hbase

Path to the configuration file

/ Etc / hadoop

/ Etc / hbase

/ Etc / hive

Storage path of HDFS

/ Hadoop / hdfs


The installation process takes note of the point:

1, the installation, you need to do each machine ssh password-free login, this http://www.linuxidc.com/Linux/2014-05/101532.htm mentioned, well after the management node between each cluster node, you can use this landing.

2, if your machine is installed before Hadoop-related services, in particular Hbase configuration inside the HBASE_HOME environment variables, you need to unset out, this will affect the environment variable, because before I put these paths into / etc / profile which lead to influence the HBase, because the path Ambari installation and before you install may be different.

3, when the service selection page, NameNode and SNameNode needs to be laid together, I try to do HA before and take them apart, but SNameNode has mountains, leading to the launch failure, the next time need be spent on HA.

4. JobTrakcer discord Namenode together will lead to not start up.

5. Datanode Block replication nodes can not be less than the number, basically require> = 3.
6. Confirm Hosts, the need for attention Warning inside information, to dispose of all related Warning, some Warning will cause installation errors.

7. Remember that the installation of the new users, you need to use these users.

8. Hive and HBase Master deployed in the same node, where of course you can also be separated. Set up after the start of the installation.

9. If the next case of failure to install, how to re-install.

First, let's delete the document directory system has been installed,

sh file_cp.sh cmd "rm -rf / usr / lib / Hadoop && rm -rf / usr / lib / hbase && rm -rf / usr / lib / zookeeper"

sh file_cp.sh cmd "rm -rf / etc / hadoop && rm -rf / etc / hbase && rm -rf / hadoop && rm -rf / var / log / hadoop"

sh file_cp.sh cmd "rm -rf / etc / ganglia && rm -rf / etc / hcatalog && rm -rf / etc / hive && rm -rf / etc / nagios && rm -rf / etc / sqoop && rm -rf / var / log / hbase && rm -rf / var / log / nagios && rm -rf / var / log / hive && rm -rf / var / log / zookeeper && rm -rf / var / run / hadoop && rm -rf / var / run / hbase && rm -rf / var / run / zookeeper "

Then remove the relevant packages off installed in Yum.

sh file_cp.sh cmd "yum -y remove ambari-log4j hadoop hadoop-lzo hbase hive libconfuse nagios sqoop zookeeper"

I use here to write their own Shell, easy to execute commands between multiple machines:


Reset under Ambari-Server

ambari-server stop


ambari-server reset


ambari-server start

10. Note that the time synchronization of time can cause eye regionserver
11. iptables needs to close, sometimes the machine may be restarted, it is not only needed service stop also need chkconfig closed off.

After the final installation is complete, log in to view the address in case the service:

http: // management node ip: 8080, for example, I have here: after before landing, you need to set the time Ambari-server installation enter account number and password, enter

See ganglia monitoring

See nagios monitoring


After installation is complete, look at these are normal, and if you need to test yourself? But basically ran after smoke testing, normal, basic or normal, but we ourselves have to operate under the bar.

Verify HDFS

Verify Map / Reduce

Verify HBase

Verify Hive

to sum up

Here, the relevant Hadoop and related configuration hbase hive and would have completed the configuration, you need to do some stress tests. There are other aspects of the test, with the Ambari is HortonWorks packaged rpm version of Hadoop relevant source code, so there may be other versions have some differences, but as a development environment, temporary or not a lot of big impact, but have not yet used on the production, so it is said, no matter how stable, then I will process development project, the Bug encountered to be listed. overall Ambari still very worth using, after all, can reduce a lot of unnecessary configuration time, and relatively in stand-alone environment, in a clustered environment more close do some production-related performance testing and tuning tests, etc., and ganglia nagios monitoring configuration and can also be released allow us to view data related to the cluster, in general, it is recommended to use, there are new things in the Bug inevitable, but with the process, we will continue to improve. Then if you have time, Ambariserver will extend the functionality, such as adding redis / nginx like conventional high-performance modules of monitoring options. this time in get a short, Welcome Ambari.

// Update:

Recently encountered some problems Ambari of:

1. After the custom which turned append option, or still can not append.
- Go performed using iOS and Android programming (Programming)
- xCAT line installation on CentOS 6.X (Linux)
- CentOS 6.5 installation and configuration Cobbler (Server)
- How ONLYOFFICE collaborative editing document on Linux (Linux)
- VMware6 achieve nat Internet (Linux)
- Linux 64-bit porting (Programming)
- Ubuntu 10.10 install Oracle 10g graphic tutorials (Database)
- Docker startups use continuous integration deployment (Server)
- DRBD switchover (Server)
- SUSE Linux network configuration and firewall configuration (Linux)
- Linux common commands MEMO (Linux)
- Ubuntu 14.04 Configuring cuda-convnet (Linux)
- Standard and IO redirection (Linux)
- Let 32 Linux / CentOS system to support more than 4G memory (Linux)
- Linux remote wake the computer original code [C] (Linux)
- Oracle 11G using DG Broker create DataGuard (Database)
- Install NetBeans IDE 8.0 on Ubuntu, Linux Mint, Elementary OS, and Debian (Linux)
- Spring + MyBatis Multi data source switching (Database)
- Linux Timing task Crontab command Detailed (Linux)
- Java environment to build a number of issues (Linux)
  CopyRight 2002-2016 newfreesoft.com, All Rights Reserved.