Home PC Games Linux Windows Database Network Programming Server Mobile  
  Home \ Server \ Use Ambari rapid deployment Hadoop big data environment     - Android 4.2 compilation notes (Programming)

- In-depth understanding of PHP ini configuration (Server)

- Proficient in C / C ++ can also find a very good job (Programming)

- Installation of Python2.7.10 under CentOS 6.4 (Linux)

- How to adjust the system time CentOS (Linux)

- Atlassian Linux Shell Scripting the confluence remote backup methods (Linux)

- Upgrading to Debian 7.6 glibc 2.15 (Linux)

- Boot-Repair Tool - repair of frequent start-up problems (Linux)

- MongoDB collection data migration to MySQL database (Database)

- CentOS terminal display Chinese (Linux)

- Linux run queue process scheduling (Programming)

- Firewall Configuration Red Hat Enterprise Linux 4 (Linux)

- MySQL full-index scan bug (Database)

- How linux network security configuration (Linux)

- Oracle Data Guard LOG_ARCHIVE_DEST_n optional attribute parameters (Database)

- Oracle ORA-01691 error message, a single data file size limit problem (Database)

- Oracle 11g statistics collection - collection of multi-column statistics (Database)

- Nginx supports user multi-threaded downloads and resume broken (Server)

- Linux terminal program running in the background (Linux)

- How common Linux automation tasks (Server)

  Use Ambari rapid deployment Hadoop big data environment
  Add Date : 2018-11-21      

Big data related to the back-end development work over the past year, with the continuous development of Hadoop community, are constantly trying new things, this article focuses on speaking off his Ambari, the new Apache project, designed to allow you to easily rapid configuration and deployment of Hadoop ecosystem-related components of the environment, and provide maintenance and monitoring functions.

As a novice, I talk about my own learning experience, just beginning to learn, of course, the easiest Google under Hadoop, then download the packages, install a stand-alone in its own virtual machine (CentOS 6.3) is used on the Hadoop version do the test, write a few test class, then do the next test CRUD like, Running Map / Reduce test, of course, this time for Hadoop is not very understanding, constantly look at the article about the overall architecture themselves done is modify a few configuration files under conf, so Hadoop to normal run, this time several modifications in the configuration, after this stage, but also uses HBase, Hadoop ecosystem that another product, of course, modify the configuration, then start-all.sh, start-hbase.sh starting up the service, and then is to modify your program, do the test, as with Hbase learned under Zookeeper Hive and the like, and then after this operation phase after, began to study Hadoop2.0, as a way to Hadoop ecosystem as a whole have some understanding between developing their own in the company undertaken involved in related technologies only on those. but as a hobby to explore whether the people who want to know more about it , its performance how? it is specifically how it works? see large companies that PPT, people (Taobao and other large companies) simply dozens, hundreds, or even thousands of nodes, how people are managed, performance is kind of how? watching those performance tests PPT curve inside, if you can detailed understanding and tuning on their own projects? I seemingly found the answer, and that is Ambari, developed by the HortonWorks a Hadoop-related projects, specifically on the official to understand.
Learn Hadoop ecosystem

Now we often see some of the keywords are: HDFS, MapReduce, HBase, Hive, ZooKeeper, Pig, Sqoop, Oozie, Ganglia, Nagios, CDH3, CDH4, Flume, Scribe, Fluented, HttpFS so, in fact, there should be more more, Hadoop ecosystem development now considered to be fairly prosperous, while those behind the prosperity and who promoted it? read history Hadoop friends may know, Hadoop was first started in Yahoo, but now mainly by HortonWorks and Cloudera this two defenders in the company, most of which belong to two commiter company, so now the market has seen two major versions, CDH series, and community Edition, I first use the community edition, later changed to CDH3, now in exchange for community edition because there Ambari. of course, what and what not, so long as their technology at home, or can be modified to run normal. there is not much to say. talk so much nonsense , began to speak Ambari install it.
Begin deployment

First, understand the next Ambari, project address: http: //incubator.apache.org/ambari/

Installation documentation: http://incubator.apache.org/ambari/1.2.2/installing-hadoop-using-ambari/content/index.html

HortonWorks who wrote an article describes how to install my translation follows: When http://www.linuxidc.com/Linux/2014-05/101530.htm mounted installation documentation please look at it, you must install the documentation serious look, combined with their own version of the system currently used to configure different source, and the time required for the installation process is relatively long, it is necessary to seriously do each step of the installation documentation. Some say I'm here, I met problem.

The following talk about my own installation process.

Machine preparation:

My test environment uses nine HP rotten machines are cloud100 - cloud108, cloud108 as the management node.

Environment path Ambari installation:

Each machine installation directory:

/ Usr / lib / hadoop

/ Usr / lib / hbase

/ Usr / lib / zookeeper

/ Usr / lib / hcatalog

/ Usr / lib / hive

Log path, where the need to see the error information can be found in the log directory

/ Var / log / hadoop

/ Var / log / hbase

Path to the configuration file

/ Etc / hadoop

/ Etc / hbase

/ Etc / hive

Storage path of HDFS

/ Hadoop / hdfs


The installation process takes note of the point:

1, the installation, you need to do each machine ssh password-free login, this http://www.linuxidc.com/Linux/2014-05/101532.htm mentioned, well after the management node between each cluster node, you can use this landing.

2, if your machine is installed before Hadoop-related services, in particular Hbase configuration inside the HBASE_HOME environment variables, you need to unset out, this will affect the environment variable, because before I put these paths into / etc / profile which lead to influence the HBase, because the path Ambari installation and before you install may be different.

3, when the service selection page, NameNode and SNameNode needs to be laid together, I try to do HA before and take them apart, but SNameNode has mountains, leading to the launch failure, the next time need be spent on HA.

4. JobTrakcer discord Namenode together will lead to not start up.

5. Datanode Block replication nodes can not be less than the number, basically require> = 3.
6. Confirm Hosts, the need for attention Warning inside information, to dispose of all related Warning, some Warning will cause installation errors.

7. Remember that the installation of the new users, you need to use these users.

8. Hive and HBase Master deployed in the same node, where of course you can also be separated. Set up after the start of the installation.

9. If the next case of failure to install, how to re-install.

First, let's delete the document directory system has been installed,

sh file_cp.sh cmd "rm -rf / usr / lib / Hadoop && rm -rf / usr / lib / hbase && rm -rf / usr / lib / zookeeper"

sh file_cp.sh cmd "rm -rf / etc / hadoop && rm -rf / etc / hbase && rm -rf / hadoop && rm -rf / var / log / hadoop"

sh file_cp.sh cmd "rm -rf / etc / ganglia && rm -rf / etc / hcatalog && rm -rf / etc / hive && rm -rf / etc / nagios && rm -rf / etc / sqoop && rm -rf / var / log / hbase && rm -rf / var / log / nagios && rm -rf / var / log / hive && rm -rf / var / log / zookeeper && rm -rf / var / run / hadoop && rm -rf / var / run / hbase && rm -rf / var / run / zookeeper "

Then remove the relevant packages off installed in Yum.

sh file_cp.sh cmd "yum -y remove ambari-log4j hadoop hadoop-lzo hbase hive libconfuse nagios sqoop zookeeper"

I use here to write their own Shell, easy to execute commands between multiple machines:


Reset under Ambari-Server

ambari-server stop


ambari-server reset


ambari-server start

10. Note that the time synchronization of time can cause eye regionserver
11. iptables needs to close, sometimes the machine may be restarted, it is not only needed service stop also need chkconfig closed off.

After the final installation is complete, log in to view the address in case the service:

http: // management node ip: 8080, for example, I have here: after before landing, you need to set the time Ambari-server installation enter account number and password, enter

See ganglia monitoring

See nagios monitoring


After installation is complete, look at these are normal, and if you need to test yourself? But basically ran after smoke testing, normal, basic or normal, but we ourselves have to operate under the bar.

Verify HDFS

Verify Map / Reduce

Verify HBase

Verify Hive

to sum up

Here, the relevant Hadoop and related configuration hbase hive and would have completed the configuration, you need to do some stress tests. There are other aspects of the test, with the Ambari is HortonWorks packaged rpm version of Hadoop relevant source code, so there may be other versions have some differences, but as a development environment, temporary or not a lot of big impact, but have not yet used on the production, so it is said, no matter how stable, then I will process development project, the Bug encountered to be listed. overall Ambari still very worth using, after all, can reduce a lot of unnecessary configuration time, and relatively in stand-alone environment, in a clustered environment more close do some production-related performance testing and tuning tests, etc., and ganglia nagios monitoring configuration and can also be released allow us to view data related to the cluster, in general, it is recommended to use, there are new things in the Bug inevitable, but with the process, we will continue to improve. Then if you have time, Ambariserver will extend the functionality, such as adding redis / nginx like conventional high-performance modules of monitoring options. this time in get a short, Welcome Ambari.

// Update:

Recently encountered some problems Ambari of:

1. After the custom which turned append option, or still can not append.
- Linux command find (Linux)
- An Analysis of the C Algorithm for Calculating the Number of Days Between Date (Programming)
- MySQL Installation Troubleshooting (Database)
- Guide: Trickle restrict application bandwidth usage (Linux)
- Installation and configuration to compile MySQL 5.6.10 under CentOS 5.9 (Database)
- Questions about Linux compiler u-boot (Programming)
- CentOS7 installation hardware monitoring for Zabbix enterprise applications (Server)
- Java keyword final, static (Programming)
- Bash mathematical extension (Programming)
- Making Linux root file system problems on-link library (Programming)
- Java static code analysis tool Infer (Programming)
- Linux Shell introduces (Linux)
- Oracle: RETURNING clause (Database)
- Windows Server 2012 R2 Datacenter install SQL Server 2016 CTP (Database)
- In Spring AOP example explanation (Programming)
- MySQLbinlog combat on using data recovery (Database)
- Linux command line under HTTP traffic sniffing tool: httpry (Linux)
- Installation and Configuration Munin monitoring server on Linux (Server)
- Oracle Sql Loader tool has shown signs (Database)
- MySQL 5.6 Open full query log (Database)
  CopyRight 2002-2020 newfreesoft.com, All Rights Reserved.