Hadoop is a distributed system infrastructure, he enables users in distributed without knowing the underlying details of the development of distributed applications.
The important core Hadoop: HDFS and MapReduce. HDFS responsible for the storage, MapReduce responsible for the calculation.
The following describes how to install Hadoop focus:
In fact, do not bother to install Hadoop, several major condition precedent requires the following, if the following condition precedent somehow, according to the official website launch configuration is very simple.
1, Java runtime environment, we recommend that Sun releases
2, SSH public key authentication secret Free
Above environment get, the rest is just Hadoop configuration, this part of the configuration might have different versions in detail with reference to the official documentation.
VM: VMWare10.0.1 build-1379776
Operating System: CentOS7 64 bit
Install Java environment
Download: http: //www.Oracle.com/technetwork/cn/java/javase/downloads/jdk8-downloads-2133151-zhs.html
Own operating system version, select the appropriate download package according to the package if it is supported by rpm, rpm direct download, or use rpm address
rpm -ivh http://download.oracle.com/otn-pub/java/jdk/8u20-b26/jdk-8u20-linux-x64.rpm
JDK is continually updated, so the need to install the latest version of the JDK Quguan your own network to obtain the latest installation package rpm address.
Configuring SSH public key authentication secret Free
CentOS comes with a default openssh-server, openssh-clients and rsync, if your system does not, then look for its own installation.
Create a common account
Create hadoop (custom name) account, the password is also set to hadoop unified on all machines
useradd -d / home / hadoop -s / usr / bin / bash -g wheel hadoop
vi / etc / ssh / sshd_config
Find the following three configuration items, and change the following settings. If a comment, just remove the front # uncomment the configuration.
# The default is to check both .ssh / authorized_keys and .ssh / authorized_keys2
# But this is overridden so installations will only check .ssh / authorized_keys
AuthorizedKeysFile .ssh / authorized_keys
.ssh / authorized_keys is the public key storage path.
Public key generation
With hadoop account login.
ssh-keygen -t rsa -P ''
Save the resulting ~ / .ssh / id_rsa.pub file to ~ / .ssh / authorized_keys
cp ~ / .ssh / id_rsa.pub ~ / .ssh / authorized_keys
.ssh Directory using scp command to copy to the other machine, lazy approach so that all the machines are the same keys, shared public key.
scp ~ / .ssh / * hadoop @ slave1: ~ / .ssh /
Taken to ensure that ~ / .ssh / id_rsa access must be 600 to prohibit access to other users.
Referring to official configuration documentation