|
Software Environment
OS: Ubuntu 14.04.1 LTS (GNU / Linux 3.13.0-32-generic x86_64)
Java: jdk1.7.0_75
Hadoop: hadoop-2.6.0
Hbase: hbase-1.0.0
Cluster Machine:
IP HostName Mater RegionServer
10.4.20.30 master yes no
10.4.20.31 slave1 no yes
10.4.20.32 slave2 no yes
ready
Suppose you have installed and deployed the Hadoop cluster Java, you can refer to Spark on YARN Cluster Deployment Guide article.
Download extract
HBase can download the latest version from the official download address, recommended stable binary version directory. I downloaded hbase-1.0.0-bin.tar.gz. Make sure you download the version compatible with your existing Hadoop version (compatibility list) and JDK version supported (HBase 1.0.x is no longer supported by the JDK 6).
unzip
tar -zxvf hbase-1.0.0-bin.tar.gz
cd hbase-1.0.0
Configuring HBase
Edit hbase-env.sh file, modify the JAVA_HOME for your path.
# The java implementation to use. Java 1.7+ required.
export JAVA_HOME = / home / spark / workspace / jdk1.7.0_75
Edit conf / hbase-site.xml file:
< Configuration>
< Property>
< Name> hbase.rootdir < / name>
< Value> hdfs: // master: 9000 / hbase < / value>
< / Property>
< Property>
< Name> hbase.cluster.distributed < / name>
< Value> true < / value>
< / Property>
< Property>
< Name> hbase.zookeeper.quorum < / name>
< Value> master, slave1, slave2 < / value>
< / Property>
< Property>
< Name> hbase.zookeeper.property.dataDir < / name>
< Value> / home / spark / workspace / zookeeper / data < / value>
< / Property>
< / Configuration>
The first attribute specifies the machine hbase storage directory, you must configure the Hadoop cluster core-site.xml file is consistent; the second attribute specifies hbase operating mode, true representatives of the whole distribution pattern; a third attribute specifies Zookeeper management machine, usually an odd number; the fourth attribute is the path of data stored. The default HBase comes Zookeeper I use here.
Configuring regionservers, add the following in regionservers file:
slave1
slave2
regionservers file lists all the running hbase machines (ie HRegionServer). Hadoop configuration of this file and the file is very similar to the slaves, each line specifies the host name of a machine. When HBase start time, all machines listed in this file will start. Also true when closed. Our configuration means on slave1, slave2, slave3 will start RegionServer.
Good hbase configuration file distributed to slave
scp -r hbase-1.0.0 spark @ slave1: ~ / workspace /
scp -r hbase-1.0.0 spark @ slave2: ~ / workspace /
Modify ulimit restrictions
HBase will open a large number of file handles and processes at the same time, more than the default limit Linux, resulting in the following error may occur.
2010-04-06 03: 04: 37,542 INFO org.apache.hadoop.hdfs.DFSClient: Exception increateBlockOutputStream java.io.EOFException
2010-04-06 03: 04: 37,542 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-6935524980745310745_1391901
So edit /etc/security/limits.conf file, add the following two lines, and the number of processes increase the number of handles that can be opened. Note that the spark into your running HBase username.
spark - nofile 32768
spark - nproc 32000
Also you need to add this line in /etc/pam.d/common-session:
session required pam_limits.so
Otherwise, the configuration will not take effect on /etc/security/limits.conf.
Finally logout (logout or exit) and then log in, the configuration to take effect! Use ulimit -n -u command to check the maximum number of files and processes are changed. Remember to run on each machine to install HBase oh.
Run HBase
Running on the master
cd ~ / workspace / hbase-1.0.0
bin / start-hbase.sh
Verify successful installation HBase
In the master should be run jps HMaster process. Running on the respective slave jps should be HQuorumPeer, HRegionServer two processes.
In the browser, type http: // master: 16010 see HBase Web UI. |
|
|
|