Home PC Games Linux Windows Database Network Programming Server Mobile  
  Home \ Server \ Use Ambari rapid deployment Hadoop big data environment     - Java static code analysis tool Infer (Programming)

- Select helps secure the system network management tools (Linux)

- Linux Log (Linux)

- Oracle data row split multiple lines (Database)

- Linux operation and maintenance of the actual file system, file links (Linux)

- Shell programming entry (Programming)

- OpenSUSE GPG Comments (Linux)

- To install and configure the Jetty server and JDK under Ubuntu 14.04.2 (Server)

- To generate a certificate using OpenSSL under Linux (Server)

- How to troubleshoot Windows and Ubuntu dual system time is not synchronized (Linux)

- You really do need to know a variety of programming languages (Programming)

- Package the Python script file into an executable file (Programming)

- Linux environment to configure Apache + Django + wsgi (Server)

- MongoDB study notes - polymerization (Database)

- Zabbix Agent for Linux Installation and Configuration (Server)

- Basic data types JavaScript type system and the type of packaging (Programming)

- Summary Linux operating system some tips to prevent attacks (Linux)

- hexdump Linux command (Linux)

- Shell script to crawl through AWR SQL Report Problems (Database)

- Vim simple configuration (Linux)

  Use Ambari rapid deployment Hadoop big data environment
  Add Date : 2018-11-21      

Big data related to the back-end development work over the past year, with the continuous development of Hadoop community, are constantly trying new things, this article focuses on speaking off his Ambari, the new Apache project, designed to allow you to easily rapid configuration and deployment of Hadoop ecosystem-related components of the environment, and provide maintenance and monitoring functions.

As a novice, I talk about my own learning experience, just beginning to learn, of course, the easiest Google under Hadoop, then download the packages, install a stand-alone in its own virtual machine (CentOS 6.3) is used on the Hadoop version do the test, write a few test class, then do the next test CRUD like, Running Map / Reduce test, of course, this time for Hadoop is not very understanding, constantly look at the article about the overall architecture themselves done is modify a few configuration files under conf, so Hadoop to normal run, this time several modifications in the configuration, after this stage, but also uses HBase, Hadoop ecosystem that another product, of course, modify the configuration, then start-all.sh, start-hbase.sh starting up the service, and then is to modify your program, do the test, as with Hbase learned under Zookeeper Hive and the like, and then after this operation phase after, began to study Hadoop2.0, as a way to Hadoop ecosystem as a whole have some understanding between developing their own in the company undertaken involved in related technologies only on those. but as a hobby to explore whether the people who want to know more about it , its performance how? it is specifically how it works? see large companies that PPT, people (Taobao and other large companies) simply dozens, hundreds, or even thousands of nodes, how people are managed, performance is kind of how? watching those performance tests PPT curve inside, if you can detailed understanding and tuning on their own projects? I seemingly found the answer, and that is Ambari, developed by the HortonWorks a Hadoop-related projects, specifically on the official to understand.
Learn Hadoop ecosystem

Now we often see some of the keywords are: HDFS, MapReduce, HBase, Hive, ZooKeeper, Pig, Sqoop, Oozie, Ganglia, Nagios, CDH3, CDH4, Flume, Scribe, Fluented, HttpFS so, in fact, there should be more more, Hadoop ecosystem development now considered to be fairly prosperous, while those behind the prosperity and who promoted it? read history Hadoop friends may know, Hadoop was first started in Yahoo, but now mainly by HortonWorks and Cloudera this two defenders in the company, most of which belong to two commiter company, so now the market has seen two major versions, CDH series, and community Edition, I first use the community edition, later changed to CDH3, now in exchange for community edition because there Ambari. of course, what and what not, so long as their technology at home, or can be modified to run normal. there is not much to say. talk so much nonsense , began to speak Ambari install it.
Begin deployment

First, understand the next Ambari, project address: http: //incubator.apache.org/ambari/

Installation documentation: http://incubator.apache.org/ambari/1.2.2/installing-hadoop-using-ambari/content/index.html

HortonWorks who wrote an article describes how to install my translation follows: When http://www.linuxidc.com/Linux/2014-05/101530.htm mounted installation documentation please look at it, you must install the documentation serious look, combined with their own version of the system currently used to configure different source, and the time required for the installation process is relatively long, it is necessary to seriously do each step of the installation documentation. Some say I'm here, I met problem.

The following talk about my own installation process.

Machine preparation:

My test environment uses nine HP rotten machines are cloud100 - cloud108, cloud108 as the management node.

Environment path Ambari installation:

Each machine installation directory:

/ Usr / lib / hadoop

/ Usr / lib / hbase

/ Usr / lib / zookeeper

/ Usr / lib / hcatalog

/ Usr / lib / hive

Log path, where the need to see the error information can be found in the log directory

/ Var / log / hadoop

/ Var / log / hbase

Path to the configuration file

/ Etc / hadoop

/ Etc / hbase

/ Etc / hive

Storage path of HDFS

/ Hadoop / hdfs


The installation process takes note of the point:

1, the installation, you need to do each machine ssh password-free login, this http://www.linuxidc.com/Linux/2014-05/101532.htm mentioned, well after the management node between each cluster node, you can use this landing.

2, if your machine is installed before Hadoop-related services, in particular Hbase configuration inside the HBASE_HOME environment variables, you need to unset out, this will affect the environment variable, because before I put these paths into / etc / profile which lead to influence the HBase, because the path Ambari installation and before you install may be different.

3, when the service selection page, NameNode and SNameNode needs to be laid together, I try to do HA before and take them apart, but SNameNode has mountains, leading to the launch failure, the next time need be spent on HA.

4. JobTrakcer discord Namenode together will lead to not start up.

5. Datanode Block replication nodes can not be less than the number, basically require> = 3.
6. Confirm Hosts, the need for attention Warning inside information, to dispose of all related Warning, some Warning will cause installation errors.

7. Remember that the installation of the new users, you need to use these users.

8. Hive and HBase Master deployed in the same node, where of course you can also be separated. Set up after the start of the installation.

9. If the next case of failure to install, how to re-install.

First, let's delete the document directory system has been installed,

sh file_cp.sh cmd "rm -rf / usr / lib / Hadoop && rm -rf / usr / lib / hbase && rm -rf / usr / lib / zookeeper"

sh file_cp.sh cmd "rm -rf / etc / hadoop && rm -rf / etc / hbase && rm -rf / hadoop && rm -rf / var / log / hadoop"

sh file_cp.sh cmd "rm -rf / etc / ganglia && rm -rf / etc / hcatalog && rm -rf / etc / hive && rm -rf / etc / nagios && rm -rf / etc / sqoop && rm -rf / var / log / hbase && rm -rf / var / log / nagios && rm -rf / var / log / hive && rm -rf / var / log / zookeeper && rm -rf / var / run / hadoop && rm -rf / var / run / hbase && rm -rf / var / run / zookeeper "

Then remove the relevant packages off installed in Yum.

sh file_cp.sh cmd "yum -y remove ambari-log4j hadoop hadoop-lzo hbase hive libconfuse nagios sqoop zookeeper"

I use here to write their own Shell, easy to execute commands between multiple machines:


Reset under Ambari-Server

ambari-server stop


ambari-server reset


ambari-server start

10. Note that the time synchronization of time can cause eye regionserver
11. iptables needs to close, sometimes the machine may be restarted, it is not only needed service stop also need chkconfig closed off.

After the final installation is complete, log in to view the address in case the service:

http: // management node ip: 8080, for example, I have here: after before landing, you need to set the time Ambari-server installation enter account number and password, enter

See ganglia monitoring

See nagios monitoring


After installation is complete, look at these are normal, and if you need to test yourself? But basically ran after smoke testing, normal, basic or normal, but we ourselves have to operate under the bar.

Verify HDFS

Verify Map / Reduce

Verify HBase

Verify Hive

to sum up

Here, the relevant Hadoop and related configuration hbase hive and would have completed the configuration, you need to do some stress tests. There are other aspects of the test, with the Ambari is HortonWorks packaged rpm version of Hadoop relevant source code, so there may be other versions have some differences, but as a development environment, temporary or not a lot of big impact, but have not yet used on the production, so it is said, no matter how stable, then I will process development project, the Bug encountered to be listed. overall Ambari still very worth using, after all, can reduce a lot of unnecessary configuration time, and relatively in stand-alone environment, in a clustered environment more close do some production-related performance testing and tuning tests, etc., and ganglia nagios monitoring configuration and can also be released allow us to view data related to the cluster, in general, it is recommended to use, there are new things in the Bug inevitable, but with the process, we will continue to improve. Then if you have time, Ambariserver will extend the functionality, such as adding redis / nginx like conventional high-performance modules of monitoring options. this time in get a short, Welcome Ambari.

// Update:

Recently encountered some problems Ambari of:

1. After the custom which turned append option, or still can not append.
- Talk about the Linux folder permissions issue again (Linux)
- BackTrack (BT3, BT4) Linux installation tutorial (Linux)
- AngularJS achieve picture upload feature (Programming)
- IOS interview questions Summary (Programming)
- Linux Basic Course: tar command description (Linux)
- Automatic batch resolve dependencies problem locally installed rpm package (Linux)
- DB2 manually create a library (Database)
- CentOS 7 server environment to quickly build Linux (Server)
- Setting up Linux machine through a proxy firewall (Linux)
- The difference between Objective-C language nil, Nil, NULL, NSNull (Programming)
- Oracle database online redo logs are several methods of recovery of deleted (Database)
- Ubuntu how to install and use Objective-C (Linux)
- Getting Started Linux Shell Scripting (Programming)
- Virtual Judge environment to build and configure under Ubuntu (Server)
- Linux PXE unattended installation PXE-E32: TFTP OPen timeout the solution (Linux)
- Java implementation linear table - represents the order of representation and chain (Programming)
- Linux process or thread is bound to a CPU (Programming)
- CentOS 6.6 permanent method to modify the DNS address (Linux)
- Windows Server 2012 R2 Datacenter install SQL Server 2016 CTP (Database)
- grep command Detailed and relevant examples (Linux)
  CopyRight 2002-2022 newfreesoft.com, All Rights Reserved.