Home PC Games Linux Windows Database Network Programming Server Mobile  
           
  Home \ Server \ Hadoop2.6.3 build clusters and the development of MapReduce WIN7 by Eclipse on Linux demo     - Source MongoDB 3.2.1 installed on CentOS6.5 (Database)

- Linux, set and view environment variables (Linux)

- Linux System Getting Started Tutorial: Using the Linux common commands (Linux)

- Ubuntu derivative version of the user and how to install SmartGit / HG 6.0.0 (Linux)

- MySQL log from the library than the main library (Database)

- Install apr support for Tomcat on Linux (Server)

- A brief introduction to some important Docker commands (Server)

- MySQL master recovery from failure using binlog (Database)

- About DataGuard three protected mode test (Database)

- Linux System shutdown procedures (Linux)

- Hadoop 2.7.1 installation configuration based on availability QJM (Server)

- Install the latest ATI / Nvidia graphics driver on Ubuntu (Linux)

- Enterprise Encrypting File System eCryptfs Comments (Linux)

- Linux Getting Started tutorial: Experience VirtualBox Virtual Machine chapter (Linux)

- Hadoop scheduling availability of workflow platform - Oozie (Server)

- Oracle table Access Control (Database)

- Linux Workstation Security Checklist - from the Linux Foundation Internal (Linux)

- Linux under DB2SQL1024N A database connection does not exist. SQLS (Database)

- Setting Squid successful anti-hotlinking (Linux)

- Shell Programming points to note about the function (Programming)

 
         
  Hadoop2.6.3 build clusters and the development of MapReduce WIN7 by Eclipse on Linux demo
     
  Add Date : 2018-11-21      
         
         
         
  In order to analyze the recent domestic aviation tourism common security vulnerabilities, thought of using big data to analyze, in fact, the data is not large, but there is no production projects using Hadoop, so here actually used once.

Look at it, the final analysis by hadoop proportional effect typical domestic aviation tourism firms by the number of common security vulnerabilities analysis of results hadoop
The first official use Hadoop, is sure to encounter a lot of problems, a lot of reference articles on the web, I put myself from 0 to use the process to build record yourself or someone else to facilitate future reference.

Before simply used the storm, suitable for handling real-time data. hadoop prefer treatment, a lot of online tutorials hadoop static data, but some older versions: for example, some belong hadoop1.x era, some of the machine is installed on Ubuntu, some introduction to the theory, some direct referral code demo. My computer is WIN7, intends to build a cluster in a test server linux Red Hat series, then by native win7 tune MapReduce parallel development program. Because content is more, this blog is mainly written theory and pseudo Hadoop cluster / cluster installation process and the Eclipse plug-in installed, followed by time to write an Eclipse development of DEMO and how hadoop analysis of aviation tourism typical security vulnerabilities.

Write the following major sections below describe the theoretical knowledge may be wrong, mainly to facilitate future reference yourself or others:

One, Hadoop version introduced

Two, Hadoop noun theoretical introduction

Three, SSH without password Linux

Four, Hadoop stand-alone installation

Five, Hadoop running in standalone mode

Six, Hadoop pseudo-distributed deployment

Seven, Hadoop cluster deployment

Eight, Eclipse plug-in installed

Nine, the installation process problems I encountered

One, Hadoop version introduced

Hadoop has two versions 1.x and 2.x, refer others to the official point of view: Hadoop 1.x consists of a distributed file system, HDFS and MapReduce framework composed of an off-line computing, HDFS and consists of a plurality of DataNode NameNode composition, MapReduce consists of a plurality of TaskTracker JobTracker and composition; and Hadoop 2.x contains support NameNode a scale of HDFS, a resource management system YARN and a computational framework MapReduce run offline on the YARN, YARN it JobTracker resource management and job control functions separately, were realized by the component ResourceManager and ApplicationMaster, wherein, ResourceManager responsible for resource allocation for all applications, and is responsible for managing ApplicationMaster only one application.

Their architecture made a major adjustment, the internal details interested can go to study, but the parameter name the most intuitive for developers to see the profile is not the same.

Two, Hadoop noun theoretical introduction

System roles: ResourceManager, ApplicationMaster, NodeManager

Application Name: Job

The formation of the interface: Mapper, Reducer

HDFS: Namenode, Datanode

Hadop1.x era system roles and the concept JobTracker TaskTracker of, Hadoop2.X era replaces these two roles with Yarn. TaskTracker Map-reduce cluster is a part of every machine, and he has done is to monitor the machine where the main own resources. TaskTracker tasks while monitoring the current operating conditions of the machine. TaskTracker need to send this information to the heartbeat JobTracker, JobTracker collects job assigned to run the newly submitted information on which machines.
ResourceManager is a service center, it to do is to schedule the start of each Job ApplicationMaster belongs, in addition to monitoring the presence ApplicationMaster

NodeManager features more loyal, Container is responsible for maintenance of a state, and to keep the ResourceManager heartbeat.

ApplicationMaster responsible for all the work a Job life cycle, similar to the old framework JobTracker. But note that every Job (not every) has a ApplicationMaster, it can run on a machine other than the ResourceManager.

NameNode can be seen as a distributed file system manager, responsible for managing the namespace file system, cluster configuration data storage and replication blocks and the like. Meta-data is stored in the file system will NameNode memory, this information includes the file information, the information corresponding to each file and each file block block DataNode of information.

Datanode work node file system, based on their client or scheduling namenode store and retrieve data, and they regularly send a list of the stored block (block) to namenode.

hadoop system, general correspondence between the master / slaves are:
master --- NameNode; ResourceManager;
slaves --- Datanode; NodeManager
In MapReduce, one ready to submit an application to perform is called "job (job)", carved out from a job running on compute nodes each unit of work is called "Task (task)".

Mapper task runs, it is the input file into lines and each line to the standard input passed to the executable process map function. At the same time, collect the contents of an executable file mapper process standard output, and the receipt of each line is converted into key / value pairs, as the output of mapper. Wherein the value of the distance from the key file 0th character, the value of value for the row.

Reducer class reduce function accepts Map function assembles the key / value, where key is the key output Map, values ​​are the keys corresponding to the data collection.

Three, SSH without password Linux

Ssh connection linux server, except the account password to connect, also available through public and private key pair to log the way, here let SSH without password Linux, mainly for the convenience of Hadoop Master Slave directly connected to each machine. So creating SSH without password Linux and Hadoop functionality does not matter, the way you can create a reference:

cd ~ / .ssh / # .ssh file into the current user's home directory folder

rm ./id_rsa* # delete id_rsa beginning of the existing public key file, it may not have

ssh-keygen -t rsa # Create a public file, prompted, all press OK

cat ./id_rsa.pub >> ./authorized_keys # id_rsa.pub public key of the content of files is appended to the current directory authorized_keys file

Create a public document is complete, the test you can try:

ssh Master #Master current machine name, or ssh current ip. If you do not need to enter a password, then success

Then execute:

scp ~ / .ssh / id_rsa.pub hadoop @ Slave1: / home / hadoop / # premise of this instruction execution, need to have a Slave1 machine, and the current machine Hosts for Slave1 machines made IP Viking mapping, and machine Slave1 there is a user name hadoop, and the user's file directory is / home / hadoop /. This instruction is meant to copy the files in the current folder id_rsa.put public under the machine to your user's home directory .ssh file to the remote machine Slave1 the / home / hadoop directory and user name to access the remote machine is hadoop.

Will be asked to enter the command input Slave1 machine hadoop user's password, after successfully entered will pass id_rsa.pub file to the Slave1 machine. For example, there will be displayed:

id_rsa.pub 100% 391 0.4KB / s 00:00

Then the public key file on Slave1 Master machine onto the user's home directory /.ssh/authorized_keys file command on Slave1 operation is as follows:

mkdir ~ / .ssh # If it does not exist to create

cat ~ / id_rsa.pub >> ~ / .ssh / authorized_keys

rm ~ / id_rsa.pub # copied can be deleted

Now on the Master machine test, because the generated public key file into the Master Slave1 machine hadoop user's position to develop, can avoid password Slave1 machine.

[Hadoop @ Master .ssh] $ ssh Slave1

[Hadoop @ Slave1 ~] $ exit # show no password has been entered Slave1 machine, exit back to the Master machine

logout

Connection to Slave1 closed.

[Hadoop @ Master .ssh] $ # display machine back to the Master

SSH without password Linux is complete.

Some time ago Redis unauthorized access (that is, no password) can lead to a modified remote connection Redis Redis persistent file and write the public key file to a specific directory, causing no password to connect to remote ssh, so that you can configure the public key, and written by redis specific directory.

Four, Hadoop stand-alone installation

I stand-alone mode on a test server 172.26.5.187 tests done, the host name of the server 187 is amended as Master, and modify the Hosts file machine name and IP mapping, you need to execute commands with root:

vi / etc / sysconfig / network

Modify: HOSTNAME = Master

vi / etc / hosts

172.26.5.187 Master

Then create a hadoop user on the server 187:

useradd -m hadoop -s / bin / bash create hadoop user, -m create a home directory -s / bin / bash specified user login Shell

passwd hadoop

mima .. # modify settings hadoop password

usermod -g root hadoop # added root group

Download version 2.6.3 mirrored by http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.6.3/hadoop-2.6.3.tar.gz address.

The hadoop installed into / usr / local / hadoop, the hadoop-2.6.3.tar.gz into / usr / local directory, execute the command:

rm -rf / usr / local / hadoop # remove the old one (if it exists)

tar -zxf ~ / hadoop-2.6.3.tar.gz -C / usr / local

The folder is amended as hadoop, execute the command, modify the folder to your users and groups:

chown -R hadoop: hadoop / usr / local / hadoop

Then after you log hadoop:

cd / usr / local / hadoop

./bin/hadoop version

Output:

[Hadoop @ Master hadoop] $ ./bin/hadoop version

Hadoop 2.6.3

Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r cc865b490b9a6260e9611a5b8633cab885b3d247

Compiled by jenkins on 2015-12-18T01: 19Z

Compiled with protoc 2.5.0

From source with checksum 722f77f825e326e13a86ff62b34ada

This command was run using /usr/local/hadoop/share/hadoop/common/hadoop-common-2.6.3.jar

Installation was successful

Five, Hadoop running in standalone mode

Hadoop-2.6.3.tar.gz to unpack the 187 server: / usr / local / hadoop, perform the command:

mkdir ./input

cp ./etc/hadoop/*.xml ./input

Jar package comes directly test programs, files from the input folder contains analysis dfs .. regular string, if there is output to the output folder:

[Hadoop @ Master hadoop] $. / Bin / hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.3.jar grep ./input ./output 'dfs [a-z] +'

Have found an error, the effect that the authority is not enough, execute the command:

chmod -R 744 ./bin/ # change the current directory read and write permissions

Performed again:

./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.3.jar grep ./input ./output 'dfs [a-z] +'

After performing a series of output ......

        File Input Format Counters

                Bytes Read = 123

        File Output Format Counters

                Bytes Written = 23

..........

Instructions, you can check:

[Hadoop @ p5 hadoop] $ cat ./output/*

dfsadmin

Here's the actual data content of part-r-00000

[Hadoop @ p5 hadoop] $ ls output

part-r-00000 _SUCCESS

Note, Hadoop does not overwrite the default result file, so the above examples to run again displays an error message, you need to first ./output deleted.

rm -r ./output

Stand-alone mode operation was successful.

Six, Hadoop pseudo-distributed deployment

First, go to the next / usr / local / hadoop:

[Hadoop @ Master hadoop] $ pwd

/ Usr / local / hadoop

Modify core-site.xml and hdfs-site.xml configuration file, execute:

vi ./etc/hadoop/core-site.xml

< Configuration >

    < Property >

        < Name > hadoop.tmp.dir < / name >

        < Value > file: / usr / local / hadoop / tmp < / value >

        < Description > Abase for other temporary directories. < / Description >

    < / Property >

    < Property >

        < Name > fs.defaultFS < / name >

        < Value > hdfs: //172.26.5.187: 9000 < / value >

    < / Property >

< / Configuration >

vi ./etc/hadoop/hdfs-site.xml

< Configuration >

    < Property >

        < Name > dfs.replication < / name >

        < Value > 1 < / value >

    < / Property >

    < Property >

        < Name > dfs.namenode.name.dir < / name >

        < Value > file: / usr / local / hadoop / tmp / dfs / name < / value >

    < / Property >

    < Property >

        < Name > dfs.datanode.data.dir < / name >

        < Value > file: / usr / local / hadoop / tmp / dfs / data < / ​​value >

    < / Property >

< / Configuration >

After the configuration, run NameNode formatting (executed only once, after the need to perform a):

./bin/hdfs namenode -format

Successful, will see "successfully formatted" and "Exiting with status 0" prompt, if it is "Exiting with status 1" is wrong.

Open NaneNode and DataNode daemons:

[Hadoop @ Master hadoop] $ ./sbin/start-dfs.sh

You may be given:

bash: ./sbin/start-dfs.sh: enough authority

Run, add execute permissions:

chmod -R 744 ./sbin

Then perform ./sbin/start-dfs.sh may also be given:

localhost: Error: JAVA_HOME is not set and could not be found.

Run the following command to solve:

[Hadoop @ Master hadoop] $ vi ./etc/hadoop/hadoop-env.sh

Add:

export JAVA_HOME = / usr / java / jdk1.6.0_38

This sets the path Jdk

Performed again:

[Hadoop @ p5 hadoop] $ ./sbin/start-dfs.sh

16/01/06 16:05:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform ... using builtin-java classes where applicable

Starting namenodes on [localhost]

It can be ignored and will not affect normal use.

Jps view the current java process conditions:

[Hadoop @ p5 hadoop] $ jps

25978 Jps

25713 DataNode

25876 SecondaryNameNode

25589 NameNode

In the absence of any process, we have said failed to start after ./sbin/stop-dfs.sh need to stop and check the corresponding log /usr/local/hadoop/logs/hadoop-hadoop-XXX-Master.log XXX name

Browser and enter: http: //172.26.5.187: 50070 / to visit
In the following pseudo-distributed running under a built-in demo examples:

First create HTFS user directory and input folder:

./bin/hdfs dfs -mkdir -p / user / hadoop

./bin/hdfs dfs -mkdir input

./bin/hdfs dfs -put ./etc/hadoop/*.xml input # copy all xml files in the current directory / etc / hadoop / HTFS under to the input directory,

Copy the folder to view the past through the following command:

./bin/hdfs dfs -ls input

Just perform stand-alone version of the test jar package program:

./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs [a-z.] +'

After performing a series of output ..

File Input Format Counters

        Bytes Read = 219

File Output Format Counters

        Bytes Written = 77

See HDFS in output folder:

./bin/hdfs dfs -cat output / *

dfsadmin
dfs.replication
dfs.namenode.name.dir
dfs.datanode.data.dir
Screenshot is not modified before the hostname, so the machine name or p5.

Found to have more than one string to find, you can put the files in HDFS take back output folder:

rm -r ./output # remove local output folder (if it exists)

./bin/hdfs dfs -get output ./output # HDFS on the output file folder copied to the machine

cat ./output/* # to view the current output file folder under the user directory content

Here pseudo-distributed program run is complete.

The passage ./sbin/start-dfs.sh start Hadoop, MapReduce environment is just the start, we can start YARN, let YARN responsible for resource management and task scheduling.

Modify the file:

mv ./etc/hadoop/mapred-site.xml.template ./etc/hadoop/mapred-site.xml

[Hadoop @ p5 hadoop] $ vi ./etc/hadoop/mapred-site.xml

< Configuration >

    < Property >

        < Name > mapreduce.framework.name < / name >

        < Value > yarn < / value >

    < / Property >

< / Configuration >

[Hadoop @ p5 hadoop] $ vi ./etc/hadoop/yarn-site.xml

< Configuration >

    < Property >

        < Name > yarn.nodemanager.aux-services < / name >

        < Value > mapreduce_shuffle < / value >

        < / Property >

< / Configuration >

Excuting an order:

./sbin/start-yarn.sh # start YARN

./sbin/mr-jobhistory-daemon.sh start historyserver # Enable the history server in order to view the task in the operation of the Web

Jps Views:

[Hadoop @ Master hadoop] $ jps

27492 Jps

27459 JobHistoryServer

25713 DataNode

27013 ResourceManager

27283 NodeManager

25876 SecondaryNameNode

25589 NameNode

You can view the task after the operation at http://172.26.5.187:8088/cluster launched successfully.
If you do not want to start YARN, be sure to mapred-site.xml configuration file rename, change mapred-site.xml.template

Close YARN script as follows:

./sbin/stop-yarn.sh

./sbin/mr-jobhistory-daemon.sh stop historyserver

Seven, Hadoop cluster deployment

Use 172.26.5.20 172.26.5.187 do Master and Slave do test

The first 187 operation:

[Hadoop @ Master ~] $ su root

password:

[Root @ p5 hadoop] # vi / etc / sysconfig / network

NETWORKING = yes

HOSTNAME = p5

Modify: HOSTNAME = Master

Then modify the hosts file

[Root @ p5 hadoop] # vi / etc / hosts

172.26.5.187 Master

172.26.5.20 Slave1

Then on 20 server operations:

useradd -m hadoop -s / bin / bash create hadoop user, -m create a home directory -s / bin / bash specified user login Shell

passwd hadoop

mima ...

usermod -g root hadoop

Then root privileges under:

[Root @ Slave1 ~] # vi / etc / sysconfig / network

NETWORKING = yes

HOSTNAME = p2

Modify: HOSTNAME = Slave1

[Root @ Slave1 ~] # vi / etc / hosts

172.26.5.20 Slave1

172.26.5.187 Master

test:

ping Master -c 3

ping Slave1 -c 3

187 and 20 illustrate the configuration can ping through no problem.

187 Master operation (stand-alone mode is operated):

cd ~ / .ssh

rm ./id_rsa*

ssh-keygen -t rsa # keep pressing Enter to

cat ./id_rsa.pub >> ./authorized_keys

Upon completion of the executable ssh Master verify (may need to enter yes, after the success of the implementation of exit back to the original terminal). Followed by Master node on the public key is transmitted to Slave1 nodes:

scp ~ / .ssh / id_rsa.pub hadoop @ Slave1: / home / hadoop /

After you enter you will be prompted to transfer is completed, as shown below:

id_rsa.pub 100% 391 0.4KB / s 00:00

Then on Slave1 172.26.5.20 node will join ssh public key authorization:

mkdir ~ / .ssh # If there is no need to create the folder, if it has ignored the existence

cat ~ / id_rsa.pub >> ~ / .ssh / authorized_keys

rm ~ / id_rsa.pub # spent on it deleted

172.26.5.187 on the test server without a password to connect 20:

[Hadoop @ Master .ssh] $ ssh Slave1

[Hadoop @ Slave1 ~] $ exit #exit Back 187 server

logout

Connection to Slave1 closed.

[Hadoop @ Master .ssh] $

187 on the implementation of:

[Hadoop @ Master .ssh] $ vi ~ / .bashrc

export PATH = $ PATH: / usr / local / hadoop / bin: usr / local / hadoop / sbin

187 modify the configuration file:

[Hadoop @ Master .ssh] $ cd / usr / local / hadoop / etc / hadoop

[Hadoop @ Master hadoop] $ vi slaves

Delete localhost, add the line: Slave1

File slaves, as the host name DataNode write the file, one per line.

187 on cd to / usr / local / hadoop / etc / hadoop directory, modify the configuration file:

vi core-site.xml

< Configuration >

    < Property >

        < Name > fs.defaultFS < / name >

        < Value > hdfs: // Master: 9000 < / value >

    < / Property >

    < Property >

        < Name > hadoop.tmp.dir < / name >

        < Value > file: / usr / local / hadoop / tmp < / value >

        < Description > Abase for other temporary directories. < / Description >

    < / Property >

< / Configuration >

vi hdfs-site.xml

< Configuration >

    < Property >

        < Name > dfs.namenode.secondary.http-address < / name >

        < Value > Master: 50090 < / value >

    < / Property >

    < Property >

        < Name > dfs.replication < / name >

        < Value > 1 < / value >

    < / Property >

    < Property >

        < Name > dfs.namenode.name.dir < / name >

        < Value > file: / usr / local / hadoop / tmp / dfs / name < / value >

    < / Property >

    < Property >

        < Name > dfs.datanode.data.dir < / name >

        < Value > file: / usr / local / hadoop / tmp / dfs / data < / ​​value >

    < / Property >

< / Configuration >

vi mapred-site.xml

< Configuration >

    < Property >

        < Name > mapreduce.framework.name < / name >

        < Value > yarn < / value >

    < / Property >

    < Property >

        < Name > mapreduce.jobhistory.address < / name >

        < Value > Master: 10020 < / value >

    < / Property >

    < Property >

        < Name > mapreduce.jobhistory.webapp.address < / name >

        < Value > Master: 19888 < / value >

    < / Property >

< / Configuration >

vi yarn-site.xml

< Configuration >

    < Property >

        < Name > yarn.resourcemanager.hostname < / name >

        < Value > Master < / value >

    < / Property >

    < Property >

        < Name > yarn.nodemanager.aux-services < / name >

        < Value > mapreduce_shuffle < / value >

    < / Property >

< / Configuration >

cd / usr / local

rm -rf ./hadoop/tmp # delete temporary files Hadoop

rm -rf ./hadoop/logs/* # delete the log file

Then after the hadoop file modified the configuration file on the folder after compression 187 is sent to the Slave machine, here to upload server 20.

tar -zcf ~ / hadoop.master.tar.gz ./hadoop # compresses to the user's home directory

cd ~

[Hadoop @ Master ~] $ scp ./hadoop.master.tar.gz Slave1: / home / hadoop # copied into Salve1

hadoop.master.tar.gz 100% 187MB 11.0MB / s 00:17

After copying to the server operating Slave1 20:

rm -rf / usr / local / hadoop # remove the old one (if it exists)

tar -zxf ~ / hadoop.master.tar.gz -C / usr / local

chown -R hadoop: hadoop / usr / local / hadoop

Then in 187 starts on the start:

[Hadoop @ Master hadoop] $ ./sbin/start-dfs.sh

./sbin/start-yarn.sh # start YARN

./sbin/mr-jobhistory-daemon.sh start historyserver # Enable the history server in order to view the task in the operation of the Web

After the execution may be given:

namenode process will not start, error: Storage directory / usr / local / hadoop / tmp / dfs / name does not exist, you need to reformat namenode.

On the implementation of 187: hdfs namenode -format

Then you need to turn off the firewall server 187 and 20, otherwise it will lead nowhere port access, inexplicable error:

[Hadoop @ Master local] $ service iptables stop

[Hadoop @ Slave1 local] $ service iptables stop

187 start again on startup, and then on the 187 queries as follows:

[Hadoop @ Master hadoop] $ hdfs dfsadmin -report

16/01/21 17:55:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform ... using builtin-java classes where applicable

Configured Capacity: 52844687360 (49.22 GB)

Present Capacity: 44751773696 (41.68 GB)

DFS Remaining: 44741738496 (41.67 GB)

DFS Used: 10035200 (9.57 MB)

DFS Used%: 0.02%

Under replicated blocks: 7

Blocks with corrupt replicas: 0

Missing blocks: 0

-------------------------------------------------

Live datanodes (1):

Name: 172.26.5.20:50010 (Slave1)

Hostname: Slave1

Decommission Status: Normal

Configured Capacity: 52844687360 (49.22 GB)

DFS Used: 10035200 (9.57 MB)

Non DFS Used: 8092913664 (7.54 GB)

DFS Remaining: 44741738496 (41.67 GB)

DFS Used%: 0.02%

DFS Remaining%: 84.67%

Configured Cache Capacity: 0 (0 B)

Cache Used: 0 (0 B)

Cache Remaining: 0 (0 B)

Cache Used%: 100.00%

Cache Remaining%: 0.00%

Xceivers: 1

Last contact: Thu Jan 21 17:55:44 CST 2016

If Live datanodes (1) expressed a Datanodes, it represents a successful start.

In this process, the problem may also occur:

For example DataNode and NodeManager process started after the death of the 20 server automatically, you can check the error log:

Caused by: java.net.UnknownHostException: p2: p2 ...

It may be that the machine name is not modified successfully launched shh, re-connect after modifying / etc / sysconfig / network in the HOSTNAME value.

Finally, start the service on 187 Views:

[Hadoop @ Master hadoop] $ jps

10499 ResourceManager

10801 Jps

10770 JobHistoryServer

10365 SecondaryNameNode

10188 NameNode

20 Views:

[Hadoop @ Slave1 ~] $ jps

4977 NodeManager

5133 Jps

4873 DataNode

Indicating that the startup was successful.

Just execute the following tested comes with demo program on the cluster:

187 server performs again :( If you do, you need to delete: ./ bin / hdfs dfs -rm -r output # Delete output folder)

hdfs dfs -mkdir -p / user / hadoop

hdfs dfs -mkdir input

hdfs dfs -put /usr/local/hadoop/etc/hadoop/*.xml input

[Hadoop @ Master hadoop] $ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs [a-z.] +'

You may be given: java.net.NoRouteToHostException: No route to host

Then the need to ensure that the firewall server 187 and 20 are closed, under root permissions to view the firewall:

service iptables status

With the need to ensure that the root account turn off the firewall: service iptables stop

On the implementation of 187:

[Hadoop @ Master hadoop] $ ./bin/hdfs dfs -put ./etc/hadoop/*.xml input

[Hadoop @ Master hadoop] $ ./bin/hdfs dfs -rm -r output

[Hadoop @ Master hadoop] $ ./bin/hdfs dfs -ls input

[Hadoop @ Master hadoop] $ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs [a-z.] +'

Cluster test is successful!

Eight, Eclipse plug-in installed

To compile and run on the Eclipse MapReduce programs, you need to install hadoop-eclipse-plugin, https: under //github.com/winghc/hadoop2x-eclipse-plugin here and release the source code directory lay package hadoop-eclipse-plugin -2.6.0.jar other three versions of the jar package. I downloaded through other channels hadoop-eclipse-plugin-2.6.3.jar version of the plug-in installed MyEclipse.

In MyEclipse in window-> Preferences-> Hadoop Map / Reduce, Hadoop installation directory to select the folder win7 local Hadoop, for example, I unzipped it into Hadoop: D: hadoop-2.6.3 inside.

Window-> Show View-> Other select Map / Reduce, in the panel, right-click and choose New Hadoop Location, in the General option, because before fs.defaultFS set the value of hdfs: //172.26.5.187: 9000, so DFS Master of Port written as 9000, Location Name just write, for example, I wrote 187Hadoop, Map / Reduce (V2) Master of Host write 172.26.5.187. Finally, click Finish to complete the configuration.

After the configuration Project Explorer in DFS Location. Double-click the following 187Hadoop can view HDFS file 187 under the cluster.

But the connection to the remote linux by WIN7 Hadoop cluster will be given, such as behind Myeclipse application execution, you may report the following error:

....................................

INFO client.RMProxy: Connecting to ResourceManager at Master / 172.26.5.187: 8032

INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/SL/.staging/job_1452581976741_0001

Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user = SL, access = EXECUTE, inode = "/ tmp": hadoop: supergroup: drwxrwx ---

at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission (FSPermissionChecker.java:271)

......................

Caused by: org.apache.hadoop.ipc.RemoteException (org.apache.hadoop.security.AccessControlException): Permission denied: user = SL, access = EXECUTE, inode = "/ tmp": hadoop: supergroup: drwxrwx ---

at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission (FSPermissionChecker.java:271)

..........................

By viewing, user = SL, SL is the user name of my current WIN7 login, online presentation there are several solutions, I used the most simple solution, configuration WIN7 system environment variables: HADOOP_USER_NAME = hadoop can.
Nine, the installation process problems I encountered

Each step in the above problems have written here before about the summary:

1, each execution of MapReduce programs need to delete the output directory, for example:

You need to delete: ./ bin / hdfs dfs -rm -r output # Delete output folder

2, error: java.net.NoRouteToHostException: No route to host

Solution: possible that the firewall is not turned off, resulting in network access problems. You need to shut down all cluster servers Firewall: service iptables stop (note that this is directly off the firewall to generate the best environment for access to specific ports open)

3, the implementation ./sbin/start-dfs.sh error: localhost: Error: JAVA_HOME is not set and could not be found.

Solve: i ./etc/hadoop/hadoop-env.sh, added:

export JAVA_HOME = / usr / java / jdk1.6.0_38

4,. / Sbin / start-dfs.sh after execution, if "NameNode", "DataNode", "SecondaryNameNode" process does not start successfully, see the corresponding / usr / local / hadoop logs logs / directory /.

5, / etc / hadoop / core-site.xml file fs.defaultFS If configured hdfs: // localhost: 9000 and so on, may lead to a different server port 9000 can not telnet, cause strange problems.

Solution: By executing in 187: netstat -ntl, for example, to view the following:

tcp 0 0 127.0.0.1:9000 0.0.0.0:* LISTEN

Description 127.0.0.1 listening port is 9000, this function can only lead to the connection port 9000, other servers can not be connected, if it is:

tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN

It means that any machine can be connected to 9000 ports.

6, Hadoop cluster Myeclipse connection DFS Locations under error: An internal error occurred during: "Map / Reduce location status updater".

On to the next 187 cd / usr / local / hadoop execute: ./ bin / hdfs dfs -mkdir -p / user / root / input

./bin/hdfs dfs -mkdir -p / user / root / output, there may also be a problem, or plug-in package system environment variable is not set to the user name HADOOP_USER_NAME Master machine connection.
     
         
         
         
  More:      
 
- How to install Linux Kernel 4.4 on Ubuntu (Linux)
- Ubuntu deployed under regular tasks with crontab (Linux)
- OpenStack image production in the CentOS 6.2 (Linux)
- The Objects in JavaScript (Programming)
- Learning C language pointer essays (Programming)
- Fun music library in Linux using command line (Linux)
- When the master key encounter NULL (Database)
- Linux Powerful command Awk Introduction (Linux)
- Java Concurrency - multiple threads of HelloWorld (Programming)
- CentOS7 yum install third-party source EPEL (Linux)
- GitLab upgrade to 8.2.0 (Linux)
- Use Observium to monitor your network and servers (Server)
- Struts2 interceptor simulation (Programming)
- Linux System Tutorial: Fix ImportError: No module named wxversion error (Linux)
- Ubuntu 14.04 build Android 5.1 development environment and compiler (Linux)
- C # compiler to achieve functional use in the runtime (Programming)
- Parameters of the extra port MySQL 5.6 (Database)
- Do not enter password login ssh (Server)
- The oh-my-zsh into true my zsh (Linux)
- About Git (Linux)
     
           
     
  CopyRight 2002-2020 newfreesoft.com, All Rights Reserved.