Home IT Linux Windows Database Network Programming Server Mobile  
  Home \ Server \ Availability Hadoop platform - Oozie Workflow     - Actual SSH port forwarding (Linux)

- IO reference Docker container (Server)

- Linux user directory (Linux)

- Ubuntu use three methods to install Ruby (Linux)

- X security settings in Ubuntu (Linux)

- Nginx introduced Dynamic Module Architecture (Server)

- Workaround CentOS error message during compilation PHP5 common (Linux)

- Performance issues under CentOS 6.5 VLAN devices (Linux)

- After CentOS configure SSH password Free, still prompted for a password (Linux)

- Redis is installed in Ubuntu 14.04 (Database)

- 11.2.04 Oracle RAC directory crfclust.bdb file is too large, Bug 20186278 (Database)

- Joseph Central Java implementation (Programming)

- Linux Getting Started tutorial: Experience VirtualBox Virtual Machine chapter (Linux)

- Linux FTP setting file description (Server)

- Command filter MySQL slow query log (Database)

- Ubuntu Telnet service settings (Linux)

- Linux source code analysis tool (Linux)

- Linux kernel modules related to the management Comments (Linux)

- ActiveMQ-based shared file system HA solutions (Server)

- Ubuntu 14.04 build Android 5.1 development environment and compiler (Linux)

  Availability Hadoop platform - Oozie Workflow
  Add Date : 2018-11-21      
  1 Overview

In the development of related applications using Hadoop in business is not complicated, much task, we can use Crontab to complete the scheduling related applications. Today to introduce the unified management of various scheduling tasks system, following today's shared directory:

Oozie Server
Screenshot Preview
Let's start today's content sharing.

2. Introduction

Today's content does not involve Oozie specific details of the operation, its workflow in the next blog to find out more. Today, the main role of Oozie to share the contents of its integration steps and the like.

2.1 Role

Oozie It is an open source workflow scheduling system that can manage a plurality of logical complex Hadoop job tasks, in the specified order to carry out its work. For example, our daily work scenarios:

Collecting data to HDFS
MR prepared to clean data, generate new data stored in HDFS path specified under
Create Hive table partitions, and load the data into the corresponding table partition
HQL for business use statistical indicators and statistical outputs the result to the corresponding Hive among large table
After statistical data among a large table for data export total to call outside business use
By the above routine work processes, we can write workflow system generates a workflow instance, and then regularly every day to go running instance. For such a Hadoop application scenarios, Oozie can simplify our task scheduling and execution.

2.2 based environment

The basic environment to share is:

Name Value
OS CentOS6.6
Workflow Oozie4.2
Hadoop 2.6




The above is the basis for this blog need to rely on the environment. Also you need to use JDK, Maven, MySQL drivers and so on.

3.Oozie Server

Oozie Server can provide us with a convenient Job management functionality to manage Job running state through its visual interface, of course, support the build complex Hadoop Job processes, dependencies among Job between can be configured through the workflow by Oozie unified Server to perform.

3.1 dependencies ready

Download and install Maven environment, the command is as follows:

wget http://mirrors.hust.edu.cn/apache/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz

tar -zxvf apache-maven-3.3.3-bin.tar.gz
Then add the environment variable as follows:

export M2_HOME = / home / hadoop / maven-3.3.3
export PATH = $ PATH: $ ES_HOME / bin
Then enter the following command to make it effective immediately:

. / Etc / profile
Finally, we enter mvn -version command, if the reality corresponding Maven version number, it means that Maven environment integrated OK.

About MySQL database installation and configuration is relatively simple, it does not go into details here.

Since Oozie will use its Web container where you need to install Tomcat Web server, Apache official website to download the corresponding installation package, there is not much to do repeat.

ExtJS toolkit
Will depend on the visualization toolkit, so here we need to download the tool Download Oozie we can find the DG_QuickStart page, as shown below:

Address as follows:

wget http://dev.sencha.com/deploy/ext-2.2.zip
Here we can download the installation package, download address in Oozie's official website:

wget http://mirrors.cnnic.cn/apache/oozie/4.2.0/oozie-4.2.0.tar.gz
3.2 Oozie Integration

When you are ready environment, the next, we went integration Oozie. First, we will download the installation package of Oozie decompress it, then, be packaged using Maven command. Use the command as follows:

# Unzip
tar -zxvf oozie-4.2.0.tar.gz

# Enter
cd oozie-4.2.0

# Bale
mvn clean package assembly: single -DskipTests
Note: This requires pom file is modified, the JDK, version number Hadoop, HBase, Hive and other components of a unified, can be consistent with the version number you are using.

Resulting path address is as follows:

At this time, we modify Oozie environment variable as follows:

export OOZIE_HOME = / home / hadoop / oozie-4.2.0 / distro / target / oozie-4.2.0-distro / oozie-4.2.0
export PATH = $ PATH: $ OOZIE_HOME / bin
Next, we create a folder to hold the ExtJS and Hadoop JAR files under $ OOZIE_HOME directory, a JAR file Here we downloaded earlier ExtJS archive and the Hadoop Share dirctory to libext folder can be. Because, we use MySQL to store Oozie metadata, so the need to use MySQL driver package, and therefore, we need to copy the MySQL driver package to the next libext directory.

After preparing these finished, let's start the installation, the command is as follows:

# Enter $ OOZIE_HOME / bin directory
./oozie-setup.sh prepare-war
Generates the following message, indicating success, the content is as follows:

[Hadoop @ nna bin] $ ./oozie-setup.sh prepare-war
  setting CATALINA_OPTS = "$ CATALINA_OPTS -Xmx1024m"

INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/commons-configuration-1.6.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-auth-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-common-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-common-2.6.0-tests.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-hdfs-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-hdfs-2.6.0-tests.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-hdfs-nfs-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-app-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-common-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-core-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-hs-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-hs-plugins-2.6.0. jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-jobclient-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-jobclient-2.6.0-tests. jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-shuffle-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-examples-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-nfs-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-api-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-applications-distributedshell-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-applications-unmanaged-am-launcher-2.6. 0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-client-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-common-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-registry-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-applicationhistoryservice-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-common-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-nodemanager-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-resourcemanager-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-tests-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-web-proxy-2.6.0. jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/mysql-connector-java-5.1.32-bin.jar

New Oozie WAR file with added 'ExtJS library, JARs' at /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/webapps/oozie. war

INFO: Oozie is ready to be started
If you fail, you can do the processing corresponding to the prompts.

Thus, we at /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/webapps directory is generated oozie.war file.

3.3 Oozie

We modified oozie-site.xml file in the $ OOZIE_HOME / conf directory, the contents are as follows:

< Property>
        < Name> oozie.service.JPAService.jdbc.driver < / name>
            < Value> com.mysql.jdbc.Driver < / value>
         < Description> JDBC driver class. < / Description>
    < / Property>
    < Property>
         < Name> oozie.service.JPAService.jdbc.url < / name>
         < Value> jdbc: mysql: // nna: 3306 / oozie < / value>
         < Description> JDBC URL. < / Description>
    < / Property>
    < Property>
        < Name> oozie.service.JPAService.jdbc.username < / name>
         < Value> root < / value>
         < Description> DB user name. < / Description>
    < / Property>
    < Property>
         < Name> oozie.service.JPAService.jdbc.password < / name>
         < Value> root < / value>
         < Description> DB user password. < / Description>
< / Property>
Here we use to manually create the database, oozie.service.JPAService.create.db.schema property if it is true, it means to be created automatically. Create a script manually as follows:

. GRANT ALL ON oozie * TO 'root' @ 'nna' IDENTIFIED BY 'root';
We then use the following command to generate the data table:

# In $ OOZIE_HOME / bin directory operations
./ooziedb.sh create -sqlfile oozie.sql -run
Generates the following:

[Hadoop @ nna bin] $ ./ooziedb.sh create -sqlfile oozie.sql -run
  setting CATALINA_OPTS = "$ CATALINA_OPTS -Xmx1024m"

Validate DB Connection
DB schema does not exist
Check OOZIE_SYS table does not exist
Create SQL schema
Create OOZIE_SYS table

Oozie DB has been created for Oozie version '4.2.0'

The SQL commands have been written to: oozie.sql

If executed properly, will generate oozie.sql script

3.4 Start

Next, we went to start Oozie, the command is as follows:

# In $ OOZIE_HOME / bin directory
Normal start something like this:

[Hadoop @ nna bin] $ ./oozie-start.sh
WARN: Use of this script is deprecated; use 'oozied.sh start' instead

Setting OOZIE_HOME: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0
Setting OOZIE_CONFIG: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/conf
Sourcing: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/conf/oozie-env.sh
  setting CATALINA_OPTS = "$ CATALINA_OPTS -Xmx1024m"
Setting OOZIE_CONFIG_FILE: oozie-site.xml
Setting OOZIE_DATA: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/data
Setting OOZIE_LOG: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/logs
Setting OOZIE_LOG4J_FILE: oozie-log4j.properties
Setting OOZIE_HTTP_PORT: 11000
Setting OOZIE_ADMIN_PORT: 11001
Setting OOZIE_HTTPS_PORT: 11443
Setting OOZIE_BASE_URL: http: // nna: 11000 / oozie
Setting CATALINA_BASE: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server
Setting OOZIE_HTTPS_KEYSTORE_FILE: /home/hadoop/.keystore
Setting CATALINA_OUT: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/logs/catalina.out
Setting CATALINA_PID: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/temp/oozie.pid

Using CATALINA_OPTS: -Xmx1024m -Dderby.stream.error.file = / home / hadoop / oozie-4.2.0 / distro / target / oozie-4.2.0-distro / oozie-4.2.0 / logs / derby.log
Adding to CATALINA_OPTS: -Doozie.home.dir = / home / hadoop / oozie-4.2.0 / distro / target / oozie-4.2.0-distro / oozie-4.2.0 -Doozie.config.dir = / home / hadoop /oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/conf -Doozie.log.dir = / home / hadoop / oozie-4.2.0 / distro / target / oozie- 4.2.0-distro / oozie-4.2.0 / logs -Doozie.data.dir = / home / hadoop / oozie-4.2.0 / distro / target / oozie-4.2.0-distro / oozie-4.2.0 / data -Doozie.instance.id = nna -Doozie.config.file = oozie-site.xml -Doozie.log4j.file = oozie-log4j.properties -Doozie.log4j.reload = 10 -Doozie.http.hostname = nna -Doozie .admin.port = 11001 -Doozie.http.port = 11000 -Doozie.https.port = 11443 -Doozie.base.url = http: // nna: 11000 / oozie -Doozie.https.keystore.file = / home / hadoop / .keystore -Doozie.https.keystore.pass = password -Djava.library.path =

Setting up oozie DB
  setting CATALINA_OPTS = "$ CATALINA_OPTS -Xmx1024m"

Validate DB Connection
DB schema exists

The SQL commands have been written to: /tmp/ooziedb-9100396876446618885.sql

Using CATALINA_BASE: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server
Using CATALINA_HOME: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server
Using CATALINA_TMPDIR: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/temp
Using JRE_HOME: /usr/java/jdk1.7
Using CLASSPATH: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/bin/bootstrap.jar
Using CATALINA_PID: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/temp/oozie.pid
4. Screenshot Preview

Then we can enter the browser address to see if normal start

5. Summary

About Oozie integration will be a little tedious, but this blog for individual Oozie to integrate the follow-up blog will introduce Oozie integrated into the Hadoop cluster, and usage-related workflow introduction.

6. Conclusion

This blog will share here, if you have any questions in the study of the process of learning, can be added to the group discussion or send mail to me, I will do my best to answer your questions, and the king of mutual encouragement!
- HBase in MVCC implementation mechanism and its application (Database)
- Embedded Linux Optimization (Programming)
- Ubuntu font settings: Using Windows Font (Linux)
- Talk about the Linux folder permissions issue again (Linux)
- Smooth upgrade to OpenSSH 6.1 Procedure (Linux)
- CentOS 6.5 install VNC-Server (Linux)
- About Git (Linux)
- ElasticSearch basic usage and cluster structures (Server)
- Atheros AR8161 / AR8162 network card driver problem solving in CentOS 6.4 (Linux)
- Binding unofficial Google Drive and Ubuntu 14.04 LTS (Linux)
- LAN Deployment Docker-- from scratch to create your own private warehouse Docker (Linux)
- Ubuntu 14.10 used ifconfig commands to manage your network configuration (Linux)
- Oracle query start with connect by tree (Database)
- Monitoring network traffic with Iptraf in Linux environment (Linux)
- CentOS 7 open ports (Linux)
- How to configure FirewallD in RHEL / CentOS 7 and Fedora in (Linux)
- Expert advice: Do not use the computer security IE browser (Linux)
- Bad name two variables (Linux)
- C ++ string in the end (Programming)
- Redhat 7 can only be read after installation Samba service catalog approach could not be written (Server)
  CopyRight 2002-2016 newfreesoft.com, All Rights Reserved.