Home PC Games Linux Windows Database Network Programming Server Mobile  
           
  Home \ Server \ Single-node Hadoop installation notes distributed pseudo &     - MySQL simple operation notes under Linux (Database)

- Programmers Do not neglect debugging techniques (Programming)

- Hanoi problem Java Solution (Programming)

- How to migrate MySQL to MariaDB under linux (Database)

- Mount and unloading disks under Linux (Linux)

- Java NIO2: Buffer (Programming)

- Ubuntu 15.10 install the latest Arduino IDE 1.6.7 (Linux)

- Four levels of intrusion on Linux server and counter-measures (Linux)

- CentOS 6.3 compile and install LNMP environment (Server)

- Getting Started with Linux system to learn: how to install the Shrew Soft IPsec VPN on Linux (Linux)

- Linux smart small switch rfkill (Linux)

- Nagios plugin installation tutorial of Nrpe (Linux)

- How to add and delete bookmarks in Ubuntu (Linux)

- Installation of Python2.7.10 under CentOS 6.4 (Linux)

- Oracle to create an external table (Database)

- Linux iostat command example explanation (Linux)

- Unable to start the network after restart clone a virtual machine (Linux)

- To setup NOTRACK and TRACK of conntrack in iptables (Linux)

- Struts2 Result Types (Programming)

- Weld a problem in JDK 8u60 environment (Programming)

 
         
  Single-node Hadoop installation notes distributed pseudo &
     
  Add Date : 2017-01-08      
         
         
         
  Experimental environment
CentOS 6.X
Hadoop 2.6.0
JDK 1.8.0_65

purpose
The purpose of this document is to help you quickly complete the installation and use of Hadoop on a single machine to your Hadoop Distributed File System (HDFS) and Map-Reduce framework has experience, such as a simple job to run the sample program or the like on HDFS.

prerequisites
Supported Platforms
    GNU / Linux is a product development and runtime platform. Hadoop has been validated on the 2,000-node GNU / Linux hosts forming the cluster system.
    Win32 platform is supported as a development platform. Because Distributed operation has not been fully tested on Win32, so it was not supported as a production platform.

install software
If you have not installed the required software cluster, you have to first install them.
In CentOS as an example:
# Yum install ssh rsync -y
# Ssh sshd must be installed and guaranteed by always running with Hadoop scripts to manage remote Hadoop daemons.


Create a user
# Useradd -m hadoop -s / bin / bash # Create a new user hadoop

Hosts resolve
# Cat / etc / hosts | grep ocean-lab
192.168.9.70 ocean-lab.ocean.org ocean-lab

Install jdk
JDK - http://www.Oracle.com/technetwork/java/javase/downloads/index.html
First install the JAVA environment
# Wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24 = http% 3A% 2F% 2Fwww.oracle.com% 2F; oraclelicense = accept-securebackup-cookie" "http: // download .oracle.com / otn-pub / java / jdk / 8u65-b17 / jdk-8u65-linux-x64.rpm "
# Rpm -Uvh jdk-8u65-linux-x64.rpm

Configuring Java
# Echo "export JAVA_HOME = / usr / java / jdk1.8.0_65" >> /home/hadoop/.bashrc
# Source /home/hadoop/.bashrc
# Echo $ JAVA_HOME
/usr/java/jdk1.8.0_65


Download and install hadoop
In order to get Hadoop distributions, download the most recent stable release of Apache from a mirror server.
Preparing to run Hadoop clusters
# Wget http://apache.fayea.com/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz


Unzip the downloaded Hadoop distribution. Edit conf / hadoop-env.sh file, at least set JAVA_HOME Java installation root path.

# Tar xf hadoop-2.6.0.tar.gz -C / usr / local

#### Mv /usr/local/hadoop-2.6.0 / usr / local / hadoop

Try the following command:
# Bin / hadoop
It will be displayed using the document hadoop script.

Now you can start a Hadoop cluster with three supported modes in one:
    Stand-alone mode
    Pseudo-distributed mode
    Fully distributed mode

Stand-alone mode of operation method

By default, Hadoop is configured to run in a non-distributed mode standalone Java process. This is very helpful for debugging.
Now we can execute examples to feel for Hadoop running. Hadoop comes with a wealth of examples include wordcount, terasort, join, grep and the like.
Here we choose to run grep example, we will input folder all files as input, screening them in line with the regular expression dfs [a-z.] + Word and count the number of occurrences, the final output to the output folder.

# Mkdir input
# Cp conf / *. Xml input
# ./bin/hadoop Jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep ./input/ ./ouput 'dfs [a-z.] +'
# Cat output / *
If executed successfully, then will output a lot of information about the job, the final output shown below. Results of the job will be output in the specified output folder by command cat ./output/* see the results, in line with the regular word dfsadmin appears 1:
[10:57:58] [hadoop @ ocean-lab hadoop-2.6.0] $ cat ./ouput/*
1 dfsadmin


Note, Hadoop does not overwrite the default result file, so the above examples to run again displays an error message, you need to first ./output deleted.
Otherwise, an error will be reported as follows
INFO jvm.JvmMetrics: Can not initialize JVM Metrics with processName = JobTracker, sessionId = - already initialized
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory file: /usr/local/hadoop-2.6.0/ouput already exists
When prompted "INFO metrics.MetricsUtil: Unable to obtain hostName java.net.UnknowHostException", which need to execute the following command to modify the hosts file, add the IP mapping for your host name:
# Cat / etc / hosts | grep ocean-lab
192.168.9.70 ocean-lab.ocean.org ocean-lab


Operation pseudo-distributed mode

Hadoop can run a so-called pseudo-distributed mode on a single node, then each Hadoop daemons as a standalone Java processes running.
Node and as both as NameNode DataNode, at the same time, read the files in HDFS.

Before setting up Hadoop pseudo-distributed configuration, we also need to set HADOOP environment variables, execute the following command in ~ / .bashrc set
# Hadoop Environment Variables
export HADOOP_HOME = / usr / local / hadoop-2.6.0
export HADOOP_INSTALL = $ HADOOP_HOME
export HADOOP_MAPRED_HOME = $ HADOOP_HOME
export HADOOP_COMMON_HOME = $ HADOOP_HOME
export HADOOP_HDFS_HOME = $ HADOOP_HOME
export YARN_HOME = $ HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR = $ HADOOP_HOME / lib / native
export PATH = $ PATH: $ HADOOP_HOME / sbin: $ HADOOP_HOME / bin

source ~ / .bashrc

Configuration

Use the following etc / hadoop / core-site.xml