Home IT Linux Windows Database Network Programming Server Mobile  
  Home \ Server \ Availability Hadoop platform - Oozie Workflow     - PL / SQL -> UTL_FILE use presentation package (Database)

- Ubuntu How to install and upgrade Linux Kernel 3.15 (Linux)

- 64 Ubuntu 15.04 Linux kernel upgrade to Linux 4.1.0 (Linux)

- Ubuntu 14.04 How to install Cinnamon 2.4.5 (Linux)

- Linux Tutorial Share: How to sudo command to define the PATH environment variable (Linux)

- 8 Docker knowledge you may not know (Server)

- To install the mail client terminal Evolution 3.13.2 under Ubuntu 14.04 (Linux)

- MyCAT easy entry (Database)

- CentOS install SVN server configuration and automatically synchronized to the Web directory (Server)

- Oracle query start with connect by tree (Database)

- GAMIT10.5 under CentOS installation (Linux)

- Using LLVM Clang and Blocks under Linux (Programming)

- Linux file and directory permissions settings (Linux)

- The difference between IPython and Python (Linux)

- To explore the caching mechanism for Android ListView (Programming)

- Linux common network tools: ping host sweep (Linux)

- Analyzing Linux server architecture is 32-bit / 64-bit (Server)

- to compile FFmpeg In Ubuntu (Linux)

- After you change the GRUB boot disk partition repair (Linux)

- Mhddfs: multiple smaller partitions into one large virtual storage (Linux)

  Availability Hadoop platform - Oozie Workflow
  Add Date : 2018-11-21      
  1 Overview

In the development of related applications using Hadoop in business is not complicated, much task, we can use Crontab to complete the scheduling related applications. Today to introduce the unified management of various scheduling tasks system, following today's shared directory:

Oozie Server
Screenshot Preview
Let's start today's content sharing.

2. Introduction

Today's content does not involve Oozie specific details of the operation, its workflow in the next blog to find out more. Today, the main role of Oozie to share the contents of its integration steps and the like.

2.1 Role

Oozie It is an open source workflow scheduling system that can manage a plurality of logical complex Hadoop job tasks, in the specified order to carry out its work. For example, our daily work scenarios:

Collecting data to HDFS
MR prepared to clean data, generate new data stored in HDFS path specified under
Create Hive table partitions, and load the data into the corresponding table partition
HQL for business use statistical indicators and statistical outputs the result to the corresponding Hive among large table
After statistical data among a large table for data export total to call outside business use
By the above routine work processes, we can write workflow system generates a workflow instance, and then regularly every day to go running instance. For such a Hadoop application scenarios, Oozie can simplify our task scheduling and execution.

2.2 based environment

The basic environment to share is:

Name Value
OS CentOS6.6
Workflow Oozie4.2
Hadoop 2.6




The above is the basis for this blog need to rely on the environment. Also you need to use JDK, Maven, MySQL drivers and so on.

3.Oozie Server

Oozie Server can provide us with a convenient Job management functionality to manage Job running state through its visual interface, of course, support the build complex Hadoop Job processes, dependencies among Job between can be configured through the workflow by Oozie unified Server to perform.

3.1 dependencies ready

Download and install Maven environment, the command is as follows:

wget http://mirrors.hust.edu.cn/apache/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz

tar -zxvf apache-maven-3.3.3-bin.tar.gz
Then add the environment variable as follows:

export M2_HOME = / home / hadoop / maven-3.3.3
export PATH = $ PATH: $ ES_HOME / bin
Then enter the following command to make it effective immediately:

. / Etc / profile
Finally, we enter mvn -version command, if the reality corresponding Maven version number, it means that Maven environment integrated OK.

About MySQL database installation and configuration is relatively simple, it does not go into details here.

Since Oozie will use its Web container where you need to install Tomcat Web server, Apache official website to download the corresponding installation package, there is not much to do repeat.

ExtJS toolkit
Will depend on the visualization toolkit, so here we need to download the tool Download Oozie we can find the DG_QuickStart page, as shown below:

Address as follows:

wget http://dev.sencha.com/deploy/ext-2.2.zip
Here we can download the installation package, download address in Oozie's official website:

wget http://mirrors.cnnic.cn/apache/oozie/4.2.0/oozie-4.2.0.tar.gz
3.2 Oozie Integration

When you are ready environment, the next, we went integration Oozie. First, we will download the installation package of Oozie decompress it, then, be packaged using Maven command. Use the command as follows:

# Unzip
tar -zxvf oozie-4.2.0.tar.gz

# Enter
cd oozie-4.2.0

# Bale
mvn clean package assembly: single -DskipTests
Note: This requires pom file is modified, the JDK, version number Hadoop, HBase, Hive and other components of a unified, can be consistent with the version number you are using.

Resulting path address is as follows:

At this time, we modify Oozie environment variable as follows:

export OOZIE_HOME = / home / hadoop / oozie-4.2.0 / distro / target / oozie-4.2.0-distro / oozie-4.2.0
export PATH = $ PATH: $ OOZIE_HOME / bin
Next, we create a folder to hold the ExtJS and Hadoop JAR files under $ OOZIE_HOME directory, a JAR file Here we downloaded earlier ExtJS archive and the Hadoop Share dirctory to libext folder can be. Because, we use MySQL to store Oozie metadata, so the need to use MySQL driver package, and therefore, we need to copy the MySQL driver package to the next libext directory.

After preparing these finished, let's start the installation, the command is as follows:

# Enter $ OOZIE_HOME / bin directory
./oozie-setup.sh prepare-war
Generates the following message, indicating success, the content is as follows:

[Hadoop @ nna bin] $ ./oozie-setup.sh prepare-war
  setting CATALINA_OPTS = "$ CATALINA_OPTS -Xmx1024m"

INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/commons-configuration-1.6.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-auth-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-common-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-common-2.6.0-tests.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-hdfs-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-hdfs-2.6.0-tests.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-hdfs-nfs-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-app-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-common-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-core-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-hs-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-hs-plugins-2.6.0. jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-jobclient-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-jobclient-2.6.0-tests. jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-shuffle-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-examples-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-nfs-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-api-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-applications-distributedshell-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-applications-unmanaged-am-launcher-2.6. 0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-client-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-common-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-registry-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-applicationhistoryservice-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-common-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-nodemanager-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-resourcemanager-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-tests-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-web-proxy-2.6.0. jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/mysql-connector-java-5.1.32-bin.jar

New Oozie WAR file with added 'ExtJS library, JARs' at /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/webapps/oozie. war

INFO: Oozie is ready to be started
If you fail, you can do the processing corresponding to the prompts.

Thus, we at /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/webapps directory is generated oozie.war file.

3.3 Oozie

We modified oozie-site.xml file in the $ OOZIE_HOME / conf directory, the contents are as follows:

< Property>
        < Name> oozie.service.JPAService.jdbc.driver < / name>
            < Value> com.mysql.jdbc.Driver < / value>
         < Description> JDBC driver class. < / Description>
    < / Property>
    < Property>
         < Name> oozie.service.JPAService.jdbc.url < / name>
         < Value> jdbc: mysql: // nna: 3306 / oozie < / value>
         < Description> JDBC URL. < / Description>
    < / Property>
    < Property>
        < Name> oozie.service.JPAService.jdbc.username < / name>
         < Value> root < / value>
         < Description> DB user name. < / Description>
    < / Property>
    < Property>
         < Name> oozie.service.JPAService.jdbc.password < / name>
         < Value> root < / value>
         < Description> DB user password. < / Description>
< / Property>
Here we use to manually create the database, oozie.service.JPAService.create.db.schema property if it is true, it means to be created automatically. Create a script manually as follows:

. GRANT ALL ON oozie * TO 'root' @ 'nna' IDENTIFIED BY 'root';
We then use the following command to generate the data table:

# In $ OOZIE_HOME / bin directory operations
./ooziedb.sh create -sqlfile oozie.sql -run
Generates the following:

[Hadoop @ nna bin] $ ./ooziedb.sh create -sqlfile oozie.sql -run
  setting CATALINA_OPTS = "$ CATALINA_OPTS -Xmx1024m"

Validate DB Connection
DB schema does not exist
Check OOZIE_SYS table does not exist
Create SQL schema
Create OOZIE_SYS table

Oozie DB has been created for Oozie version '4.2.0'

The SQL commands have been written to: oozie.sql

If executed properly, will generate oozie.sql script

3.4 Start

Next, we went to start Oozie, the command is as follows:

# In $ OOZIE_HOME / bin directory
Normal start something like this:

[Hadoop @ nna bin] $ ./oozie-start.sh
WARN: Use of this script is deprecated; use 'oozied.sh start' instead

Setting OOZIE_HOME: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0
Setting OOZIE_CONFIG: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/conf
Sourcing: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/conf/oozie-env.sh
  setting CATALINA_OPTS = "$ CATALINA_OPTS -Xmx1024m"
Setting OOZIE_CONFIG_FILE: oozie-site.xml
Setting OOZIE_DATA: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/data
Setting OOZIE_LOG: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/logs
Setting OOZIE_LOG4J_FILE: oozie-log4j.properties
Setting OOZIE_HTTP_PORT: 11000
Setting OOZIE_ADMIN_PORT: 11001
Setting OOZIE_HTTPS_PORT: 11443
Setting OOZIE_BASE_URL: http: // nna: 11000 / oozie
Setting CATALINA_BASE: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server
Setting OOZIE_HTTPS_KEYSTORE_FILE: /home/hadoop/.keystore
Setting CATALINA_OUT: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/logs/catalina.out
Setting CATALINA_PID: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/temp/oozie.pid

Using CATALINA_OPTS: -Xmx1024m -Dderby.stream.error.file = / home / hadoop / oozie-4.2.0 / distro / target / oozie-4.2.0-distro / oozie-4.2.0 / logs / derby.log
Adding to CATALINA_OPTS: -Doozie.home.dir = / home / hadoop / oozie-4.2.0 / distro / target / oozie-4.2.0-distro / oozie-4.2.0 -Doozie.config.dir = / home / hadoop /oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/conf -Doozie.log.dir = / home / hadoop / oozie-4.2.0 / distro / target / oozie- 4.2.0-distro / oozie-4.2.0 / logs -Doozie.data.dir = / home / hadoop / oozie-4.2.0 / distro / target / oozie-4.2.0-distro / oozie-4.2.0 / data -Doozie.instance.id = nna -Doozie.config.file = oozie-site.xml -Doozie.log4j.file = oozie-log4j.properties -Doozie.log4j.reload = 10 -Doozie.http.hostname = nna -Doozie .admin.port = 11001 -Doozie.http.port = 11000 -Doozie.https.port = 11443 -Doozie.base.url = http: // nna: 11000 / oozie -Doozie.https.keystore.file = / home / hadoop / .keystore -Doozie.https.keystore.pass = password -Djava.library.path =

Setting up oozie DB
  setting CATALINA_OPTS = "$ CATALINA_OPTS -Xmx1024m"

Validate DB Connection
DB schema exists

The SQL commands have been written to: /tmp/ooziedb-9100396876446618885.sql

Using CATALINA_BASE: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server
Using CATALINA_HOME: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server
Using CATALINA_TMPDIR: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/temp
Using JRE_HOME: /usr/java/jdk1.7
Using CLASSPATH: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/bin/bootstrap.jar
Using CATALINA_PID: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/temp/oozie.pid
4. Screenshot Preview

Then we can enter the browser address to see if normal start

5. Summary

About Oozie integration will be a little tedious, but this blog for individual Oozie to integrate the follow-up blog will introduce Oozie integrated into the Hadoop cluster, and usage-related workflow introduction.

6. Conclusion

This blog will share here, if you have any questions in the study of the process of learning, can be added to the group discussion or send mail to me, I will do my best to answer your questions, and the king of mutual encouragement!
- To change CentOS7 runlevel (Linux)
- Ubuntu uses the / etc / profile file to configure the JAVA environment variable (Linux)
- Preview function to confirm the stop resource Oracle 12c new feature crsctl (Database)
- Linux common network tools: batch scanning of hosting services netcat (Linux)
- Spring Data JPA @EnableJpaRepositories configuration in detail (Programming)
- error 1819 (HY000): your password does not satisfy the current policy requirements (Database)
- C ++ copy constructor (Programming)
- Sublime Text 3 practical functions and shortcut keys used to collect (Linux)
- Linux System Getting Started Tutorial: How to find the maximum memory your system supports (Linux)
- Java generate two-dimensional code by Zxing (Programming)
- The correct way to open Xcode - Debugging (Programming)
- Repair CentOS 6.4 Grub boot (Linux)
- Nginx Proxy timeout Troubleshooting (Server)
- Configuration OpenOCD + FT2232 under Ubuntu (Linux)
- Implement binary search algorithm in C language (Programming)
- C ++ 11 smart pointers (Programming)
- CentOS 7 source code to compile and install PHP5.6 Nginx1.7.9 and MySQL (LNMP build environment) (Server)
- Servlet 3.0 interfaces of AsyncListener (Programming)
- Oracle Database ORA-01555 snapshot too old (Database)
- Archlinux installation tutorial (Linux)
  CopyRight 2002-2016 newfreesoft.com, All Rights Reserved.