Home IT Linux Windows Database Network Programming Server Mobile  
           
  Home \ Server \ Use Ambari rapid deployment Hadoop big data environment     - Linux, security encryption to transfer files between machines (Linux)

- Linux kernel modules related to the management Comments (Linux)

- How Datadog monitor Nginx (Server)

- JSON Introduction and Usage Summary (Programming)

- Shell Programming Regular Expressions (Programming)

- Linux Getting Started tutorial: XWindow what (Linux)

- Linux cron job (Linux)

- MySQL partition table Comments (Database)

- Ubuntu amend resolv.conf restart failure problem (Linux)

- 14.04.3 Ubuntu configuration and successfully compiled source code Android 6.0 r1 (Programming)

- Use value type build better applications Swift (Programming)

- Use C program in JavaScript (Programming)

- An example of troubleshooting of embedded Linux OpenWRT (Linux)

- Prevent security threats caused Rootkit (Linux)

- Nginx + Tomcat static and dynamic load balancing and separation configuration essentials under Linux (Server)

- Git / Github use notes (Linux)

- Ubuntu modify DNS restart loss problem (Linux)

- Ubuntu 15.04 using the Eclipse 4.4, Java 8 and WTP (Linux)

- Slow update statement Performance Analysis (Database)

- CentOS 6 rapid deployment of Java applications (Linux)

 
         
  Use Ambari rapid deployment Hadoop big data environment
     
  Add Date : 2018-11-21      
         
       
         
  Foreword

Big data related to the back-end development work over the past year, with the continuous development of Hadoop community, are constantly trying new things, this article focuses on speaking off his Ambari, the new Apache project, designed to allow you to easily rapid configuration and deployment of Hadoop ecosystem-related components of the environment, and provide maintenance and monitoring functions.

As a novice, I talk about my own learning experience, just beginning to learn, of course, the easiest Google under Hadoop, then download the packages, install a stand-alone in its own virtual machine (CentOS 6.3) is used on the Hadoop version do the test, write a few test class, then do the next test CRUD like, Running Map / Reduce test, of course, this time for Hadoop is not very understanding, constantly look at the article about the overall architecture themselves done is modify a few configuration files under conf, so Hadoop to normal run, this time several modifications in the configuration, after this stage, but also uses HBase, Hadoop ecosystem that another product, of course, modify the configuration, then start-all.sh, start-hbase.sh starting up the service, and then is to modify your program, do the test, as with Hbase learned under Zookeeper Hive and the like, and then after this operation phase after, began to study Hadoop2.0, as a way to Hadoop ecosystem as a whole have some understanding between developing their own in the company undertaken involved in related technologies only on those. but as a hobby to explore whether the people who want to know more about it , its performance how? it is specifically how it works? see large companies that PPT, people (Taobao and other large companies) simply dozens, hundreds, or even thousands of nodes, how people are managed, performance is kind of how? watching those performance tests PPT curve inside, if you can detailed understanding and tuning on their own projects? I seemingly found the answer, and that is Ambari, developed by the HortonWorks a Hadoop-related projects, specifically on the official to understand.
Learn Hadoop ecosystem

Now we often see some of the keywords are: HDFS, MapReduce, HBase, Hive, ZooKeeper, Pig, Sqoop, Oozie, Ganglia, Nagios, CDH3, CDH4, Flume, Scribe, Fluented, HttpFS so, in fact, there should be more more, Hadoop ecosystem development now considered to be fairly prosperous, while those behind the prosperity and who promoted it? read history Hadoop friends may know, Hadoop was first started in Yahoo, but now mainly by HortonWorks and Cloudera this two defenders in the company, most of which belong to two commiter company, so now the market has seen two major versions, CDH series, and community Edition, I first use the community edition, later changed to CDH3, now in exchange for community edition because there Ambari. of course, what and what not, so long as their technology at home, or can be modified to run normal. there is not much to say. talk so much nonsense , began to speak Ambari install it.
Begin deployment

First, understand the next Ambari, project address: http: //incubator.apache.org/ambari/

Installation documentation: http://incubator.apache.org/ambari/1.2.2/installing-hadoop-using-ambari/content/index.html

HortonWorks who wrote an article describes how to install my translation follows: When http://www.linuxidc.com/Linux/2014-05/101530.htm mounted installation documentation please look at it, you must install the documentation serious look, combined with their own version of the system currently used to configure different source, and the time required for the installation process is relatively long, it is necessary to seriously do each step of the installation documentation. Some say I'm here, I met problem.

The following talk about my own installation process.

Machine preparation:

My test environment uses nine HP rotten machines are cloud100 - cloud108, cloud108 as the management node.

Environment path Ambari installation:

Each machine installation directory:

/ Usr / lib / hadoop

/ Usr / lib / hbase

/ Usr / lib / zookeeper

/ Usr / lib / hcatalog

/ Usr / lib / hive

Log path, where the need to see the error information can be found in the log directory

/ Var / log / hadoop

/ Var / log / hbase

Path to the configuration file

/ Etc / hadoop

/ Etc / hbase

/ Etc / hive

Storage path of HDFS

/ Hadoop / hdfs

 

The installation process takes note of the point:

1, the installation, you need to do each machine ssh password-free login, this http://www.linuxidc.com/Linux/2014-05/101532.htm mentioned, well after the management node between each cluster node, you can use this landing.

2, if your machine is installed before Hadoop-related services, in particular Hbase configuration inside the HBASE_HOME environment variables, you need to unset out, this will affect the environment variable, because before I put these paths into / etc / profile which lead to influence the HBase, because the path Ambari installation and before you install may be different.

3, when the service selection page, NameNode and SNameNode needs to be laid together, I try to do HA before and take them apart, but SNameNode has mountains, leading to the launch failure, the next time need be spent on HA.


4. JobTrakcer discord Namenode together will lead to not start up.

5. Datanode Block replication nodes can not be less than the number, basically require> = 3.
6. Confirm Hosts, the need for attention Warning inside information, to dispose of all related Warning, some Warning will cause installation errors.

7. Remember that the installation of the new users, you need to use these users.

8. Hive and HBase Master deployed in the same node, where of course you can also be separated. Set up after the start of the installation.

9. If the next case of failure to install, how to re-install.

First, let's delete the document directory system has been installed,

sh file_cp.sh cmd "rm -rf / usr / lib / Hadoop && rm -rf / usr / lib / hbase && rm -rf / usr / lib / zookeeper"

sh file_cp.sh cmd "rm -rf / etc / hadoop && rm -rf / etc / hbase && rm -rf / hadoop && rm -rf / var / log / hadoop"

sh file_cp.sh cmd "rm -rf / etc / ganglia && rm -rf / etc / hcatalog && rm -rf / etc / hive && rm -rf / etc / nagios && rm -rf / etc / sqoop && rm -rf / var / log / hbase && rm -rf / var / log / nagios && rm -rf / var / log / hive && rm -rf / var / log / zookeeper && rm -rf / var / run / hadoop && rm -rf / var / run / hbase && rm -rf / var / run / zookeeper "

Then remove the relevant packages off installed in Yum.

sh file_cp.sh cmd "yum -y remove ambari-log4j hadoop hadoop-lzo hbase hive libconfuse nagios sqoop zookeeper"

I use here to write their own Shell, easy to execute commands between multiple machines:

https://github.com/xinqiyang/opshell/tree/master/hadoop

Reset under Ambari-Server

ambari-server stop

 

ambari-server reset

 

ambari-server start


10. Note that the time synchronization of time can cause eye regionserver
11. iptables needs to close, sometimes the machine may be restarted, it is not only needed service stop also need chkconfig closed off.

After the final installation is complete, log in to view the address in case the service:

http: // management node ip: 8080, for example, I have here: after http://192.168.1.108:8080/ before landing, you need to set the time Ambari-server installation enter account number and password, enter



See ganglia monitoring

See nagios monitoring

test

After installation is complete, look at these are normal, and if you need to test yourself? But basically ran after smoke testing, normal, basic or normal, but we ourselves have to operate under the bar.

Verify HDFS

Verify Map / Reduce

Verify HBase

Verify Hive


to sum up

Here, the relevant Hadoop and related configuration hbase hive and would have completed the configuration, you need to do some stress tests. There are other aspects of the test, with the Ambari is HortonWorks packaged rpm version of Hadoop relevant source code, so there may be other versions have some differences, but as a development environment, temporary or not a lot of big impact, but have not yet used on the production, so it is said, no matter how stable, then I will process development project, the Bug encountered to be listed. overall Ambari still very worth using, after all, can reduce a lot of unnecessary configuration time, and relatively in stand-alone environment, in a clustered environment more close do some production-related performance testing and tuning tests, etc., and ganglia nagios monitoring configuration and can also be released allow us to view data related to the cluster, in general, it is recommended to use, there are new things in the Bug inevitable, but with the process, we will continue to improve. Then if you have time, Ambariserver will extend the functionality, such as adding redis / nginx like conventional high-performance modules of monitoring options. this time in get a short, Welcome Ambari.

// Update:

Recently encountered some problems Ambari of:

1. After the custom which turned append option, or still can not append.
     
         
       
         
  More:      
 
- When Linux Detailed time zone and common function of time (Linux)
- Linux deploy Tutorial (Linux)
- Based shell: using read, the command-line script to achieve mass participation and input two integer calculation (Programming)
- OpenDJ installed on RHEL6 (Linux)
- Oracle 11g statistics collection - collection of multi-column statistics (Database)
- How to install Linux Go Language (Linux)
- ls command: 15 Level Linux interview question (Linux)
- High-performance JavaScript reflows and repaints (Programming)
- Bash environment is automatically install and initialize oh-my-zsh & autojump zsh (Linux)
- Mistakenly deleted redo log file group being given the lead to start the database ORA-03113 (Database)
- iOS development -Launch Image and Launchscreen (Programming)
- Linux environment password security settings (Linux)
- After SSH change the default port, how to use Git (Linux)
- Kafka + Log4j log implement centralized management (Server)
- Ubuntu 14.04 Fixed update information is outdated error (Linux)
- Difference Docker mirror and containers (Server)
- JavaScript is implemented without new keywords constructor (Programming)
- MySQL to recover the data through binlog (Database)
- Java data structures - the single linked list LinkedList linear table (Programming)
- Scope of variables in C # (Programming)
     
           
     
  CopyRight 2002-2016 newfreesoft.com, All Rights Reserved.