Home PC Games Linux Windows Database Network Programming Server Mobile  
           
  Home \ Server \ Spark On YARN cluster installation deployment     - CentOS 6.6 compile and install phpMyAdmin configuration PostgreSQL9.4.1 (Database)

- Talk about jsonp (Programming)

- Android HTTP request with Get Information (Programming)

- To compile install and test Swift under Linux (Linux)

- Easy to get hidden administrator account (Linux)

- Git Getting Started tutorial (Linux)

- Setting grep Highlight Matches (Linux)

- How to test your MongoDB application upgrade? (Database)

- Linux folder and extract the differential file comparison techniques -rsync Magical (Linux)

- Zabbix system email alert Python script (Server)

- Why do you need close contact Rust 1.0 (Programming)

- Linux server disk expansion and Oracle tablespace file migration operations (Database)

- How to use Xmanager Remote Desktop and VNC Log (Linux)

- Redhat 7 can only be read after installation Samba service catalog approach could not be written (Server)

- Oracle set the timer task JOB scheduler to execute stored procedures or PL / SQL code block (Database)

- Cobbler remotely install CentOS system (Linux)

- Android application security of data transmission security (Programming)

- Memcached distributed caching (Server)

- The PostgreSQL database pg_dump command line does not enter a password method (Database)

- Swift 2.0 brief (Linux)

 
         
  Spark On YARN cluster installation deployment
     
  Add Date : 2018-11-21      
         
         
         
  Recently Spark cluster need to use the complete set, so we recorded the deployment process. We know Spark official three cluster deployment scenarios: Standalone, Mesos, YARN. Wherein Standalone most convenient, this paper focuses on combining YARN deployment scenarios.

Software Environment:

Ubuntu 14.04.1 LTS (GNU / Linux 3.13.0-32-generic x86_64)
Hadoop: 2.6.0
Spark: 1.3.0

0 EDITORIAL
This example demonstrates both non-root privileges, so the need to add some command line sudo, if you are running as root, please ignore sudo. Download and install software recommendations are placed above the home directory, such as ~ / workspace, so convenient, so as not to cause unnecessary trouble permissions problem.

1. Prepare the environment
Modify the host name
We will set up a master, 2 Ge slave cluster program. First, modify the host name vi / etc / hostname, modify the master on the master, which modify a slave to slave1, another empathy.

Configuring hosts
Modify the host file on each host

vi / etc / hosts

10.1.1.107 master
10.1.1.108 slave1
10.1.1.109 slave2

After you configure the user name of ping to see if the entry into force

ping slave1
ping slave2

Free SSH password
Openssh server installation

sudo apt-get install openssh-server

Private and public keys are generated on all machines

ssh-keygen -t rsa # all the way round

Between machines need to be able to access each other, put the machine on each issue id_rsa.pub master node, you can use scp to transfer public transport.

scp ~ / .ssh / id_rsa.pub spark @ master: ~ / .ssh / id_rsa.pub.slave1

On the master, all the public keys for authentication public key is added to the file authorized_keys

cat ~ / .ssh / id_rsa.pub * >> ~ / .ssh / authorized_keys

The public key file authorized_keys distributed to each slave

scp ~ / .ssh / authorized_keys spark @ master: ~ / .ssh /

SSH without password verification on each machine communication

ssh master
ssh slave1
ssh slave2

If the test is unsuccessful landing, you may need to modify the permissions of the authorized_keys file (set permissions is very important because of insecurity set security settings, and you can not use the RSA function)

chmod 600 ~ / .ssh / authorized_keys

Install Java
From the official website to download the latest version of Java can, Spark is the official description Java 6 or later as long as can be, I was under the jdk-7u75-linux-x64.gz
At ~ / workspace directory directly extract

tar -zxvf jdk-7u75-linux-x64.gz

Modify environment variables sudo vi / etc / profile, add the following note to replace the path to your home:

export WORK_SPACE = / home / spark / workspace /
export JAVA_HOME = $ WORK_SPACE / jdk1.7.0_75
export JRE_HOME = / home / spark / work / jdk1.7.0_75 / jre
export PATH = $ JAVA_HOME / bin: $ JAVA_HOME / jre / bin: $ PATH
export CLASSPATH = $ CLASSPATH:.: $ JAVA_HOME / lib: $ JAVA_HOME / jre / lib

Then the environment variables to take effect, and verify whether the installation was successful Java

$ Source / etc / profile # environment variables to take effect
$ Java -version # If you print out the following version information, then the installation was successful
java version "1.7.0_75"
Java (TM) SE Runtime Environment (build 1.7.0_75-b13)
Java HotSpot (TM) 64-Bit Server VM (build 24.75-b04, mixed mode)

Install Scala
Spark official requirements Scala version 2.10.x, do not pay attention to the wrong version 2.10.4 I got here, the official download address (we download Scala turtle speed in general).

Similarly, we unpacked in ~ / workspace in

tar -zxvf scala-2.10.4.tgz

Modify environment variables sudo vi / etc / profile again, add the following:

export SCALA_HOME = $ WORK_SPACE / scala-2.10.4
export PATH = $ PATH: $ SCALA_HOME / bin

The same method allows environment variables to take effect, and verify whether the installation was successful scala

$ Source / etc / profile # environment variables to take effect
$ Scala -version # If you print out the following version information, then the installation was successful
Scala code runner version 2.10.4 - Copyright 2002-2013, LAMP / EPFL

Installation and Configuration Hadoop YARN
Download extract
Hadoop2.6.0 version download from the official website, here to give our school a mirror Download.

Similarly, we unpacked in ~ / workspace in

tar -zxvf hadoop-2.6.0.tar.gz

Configuring Hadoop
cd ~ / workspace / hadoop-2.6.0 / etc / hadoop hadoop configuration into the directory, you need to configure the following seven documents: hadoop-env.sh, yarn-env.sh, slaves, core-site.xml, hdfs-site .xml, maprd-site.xml, yarn-site.xml

Configuring the JAVA_HOME in hadoop-env.sh

# The java implementation to use.
export JAVA_HOME = / home / spark / workspace / jdk1.7.0_75

Configuring the JAVA_HOME in yarn-env.sh

# Some Java parameters
export JAVA_HOME = / home / spark / workspace / jdk1.7.0_75

The ip or host slave nodes in slaves, the

slave1
slave2

Modify core-site.xml

< Configuration>
    < Property>
        < Name> fs.defaultFS < / name>
        < Value> hdfs: // master: 9000 / < / value>
    < / Property>
    < Property>
        < Name> hadoop.tmp.dir < / name>
        < Value> file: /home/spark/workspace/hadoop-2.6.0/tmp < / value>
    < / Property>
< / Configuration>

Modify hdfs-site.xml

< Configuration>
    < Property>
        < Name> dfs.namenode.secondary.http-address < / name>
        < Value> master: 9001 < / value>
    < / Property>
    < Property>
        < Name> dfs.namenode.name.dir < / name>
        < Value> file: /home/spark/workspace/hadoop-2.6.0/dfs/name < / value>
    < / Property>
    < Property>
        < Name> dfs.datanode.data.dir < / name>
        < Value> file: /home/spark/workspace/hadoop-2.6.0/dfs/data < / value>
    < / Property>
    < Property>
        < Name> dfs.replication < / name>
        < Value> 3 < / value>
    < / Property>
< / Configuration>

Modify mapred-site.xml

< Configuration>
    < Property>
        < Name> mapreduce.framework.name < / name>
        < Value> yarn < / value>
    < / Property>
< / Configuration>

Modify yarn-site.xml

< Configuration>
    < Property>
        < Name> yarn.nodemanager.aux-services < / name>
        < Value> mapreduce_shuffle < / value>
    < / Property>
    < Property>
        < Name> yarn.nodemanager.aux-services.mapreduce.shuffle.class < / name>
        < Value> org.apache.hadoop.mapred.ShuffleHandler < / value>
    < / Property>
    < Property>
        < Name> yarn.resourcemanager.address < / name>
        < Value> master: 8032 < / value>
    < / Property>
    < Property>
        < Name> yarn.resourcemanager.scheduler.address < / name>
        < Value> master: 8030 < / value>
    < / Property>
    < Property>
        < Name> yarn.resourcemanager.resource-tracker.address < / name>
        < Value> master: 8035 < / value>
    < / Property>
    < Property>
        < Name> yarn.resourcemanager.admin.address < / name>
        < Value> master: 8033 < / value>
    < / Property>
    < Property>
        < Name> yarn.resourcemanager.webapp.address < / name>
        < Value> master: 8088 < / value>
    < / Property>
< / Configuration>

The configured hadoop-2.6.0 folder distributed to all slaves bar

scp -r ~ / workspace / hadoop-2.6.0 spark @ slave1: ~ / workspace /

Start Hadoop
Do the following on the master, you can start hadoop up.

cd ~ / workspace / hadoop-2.6.0 # to enter the hadoop directory
bin / hadoop namenode -format # format namenode
sbin / start-dfs.sh # start dfs
sbin / start-yarn.sh # start yarn

Hadoop verify whether the installation was successful
Can you view each node started by jps command normal process. On the master should have the following processes:

$ Jps #run on master
3407 SecondaryNameNode
3218 NameNode
3552 ResourceManager
3910 Jps

On each slave should have the following processes:

$ Jps #run on slaves
2072 NodeManager
2213 Jps
1962 DataNode

Or type in the browser http: // master: 8088, should have hadoop out management interface, and can see slave1 and slave2 node.

Spark installation
Download extract
Enter the official download address to download the latest version of the Spark. I downloaded the spark-1.3.0-bin-hadoop2.4.tgz.

Decompression at ~ / workspace directory

tar -zxvf spark-1.3.0-bin-hadoop2.4.tgz
mv spark-1.3.0-bin-hadoop2.4 spark-1.3.0 # original file name is too long, change the next

Configuring Spark
cd ~ / workspace / spark-1.3.0 / conf # to enter the spark configuration directory
cp spark-env.sh.template spark-env.sh # copied from the configuration template
vi spark-env.sh # Configure add content

Add the following to the end (this is my configuration, you can modify) in spark-env.sh:

export SCALA_HOME = / home / spark / workspace / scala-2.10.4
export JAVA_HOME = / home / spark / workspace / jdk1.7.0_75
export HADOOP_HOME = / home / spark / workspace / hadoop-2.6.0
export HADOOP_CONF_DIR = $ HADOOP_HOME / etc / hadoop
SPARK_MASTER_IP = master
SPARK_LOCAL_DIRS = / home / spark / workspace / spark-1.3.0
SPARK_DRIVER_MEMORY = 1G

NOTE: In the process of setting Worker number of CPU and memory size, pay attention to the actual hardware condition of the machine, if the configuration exceeds the current Worker node hardware conditions, Worker process will fail to start.

vi slaves fill slave hostname in slaves file:

slave1
slave2

The configured spark-1.3.0 folder distributed to all slaves bar

scp -r ~ / workspace / spark-1.3.0 spark @ slave1: ~ / workspace /

Start Spark
sbin / start-all.sh

Spark verify whether the installation was successful
Check with jps in the master should have the following processes:

$ Jps
7949 Jps
7328 SecondaryNameNode
7805 Master
7137 NameNode
7475 ResourceManager

On the slave should have the following processes:

$ Jps
3132 DataNode
3759 Worker
3858 Jps
3231 NodeManager

Spark enter the Web administration page: http: // master: 8080

Run the sample
# Two threads running in local mode
./bin/run-example SparkPi 10 --master local [2]

#Spark Standalone cluster mode
./bin/spark-submit \
  --class org.apache.spark.examples.SparkPi \
  --master spark: // master: 7077 \
  lib / spark-examples-1.3.0-hadoop2.4.0.jar \
  100

#Spark On the YARN cluster yarn-cluster mode
./bin/spark-submit \
    --class org.apache.spark.examples.SparkPi \
    --master yarn-cluster \ # can also be `yarn-client`
    lib / spark-examples * .jar \
    10

Note Spark on YARN supports two modes of operation, namely yarn-cluster and yarn-client, broadly speaking, yarn-cluster for a production environment; and yarn-client interactions and is suitable for debugging, but also hope quickly see application output.
     
         
         
         
  More:      
 
- Error code: 2013 Lost connection to MySQL server during query (Database)
- Create and modify Oracle temporary table space (Database)
- zBackup: A versatile tool to remove duplicate backup (Linux)
- Ubuntu install VMware Workstation 11 tutorials at 14.04 / 14.10 (Linux)
- High-performance JavaScript DOM programming (Programming)
- shell-like program (Programming)
- Kubernetes cluster deployment (Server)
- Linux Services Comments (Linux)
- Linux System Getting Started Learning: the Linux Wireshark interface dead solve (Linux)
- Some problems and countermeasures Linux system calls exist (Linux)
- Management DB2 logs (Database)
- C language keywords Comments (Programming)
- Linux linux system security (Linux)
- How to Install Puppet in the Ubuntu 15.04 (Server)
- Linux Open coredump (Linux)
- Linux command ls (Linux)
- Linux Shell debugging (Programming)
- Kernel compile under Debian (Linux)
- To install JDK1.7 and compiler Hadoop-2.7.1 under CentOS7 (Server)
- Installation Elementary OS Freya to do some settings (Linux)
     
           
     
  CopyRight 2002-2020 newfreesoft.com, All Rights Reserved.