Home IT Linux Windows Database Network Programming Server Mobile  
           
  Home \ Server \ Availability Hadoop platform - Oozie Workflow     - Security basics: simple analytical framework for Linux system firewall (Linux)

- Redmine Installation (Linux)

- RedHat Linux 5.5 installation process SVN Service Notes (Server)

- An Example of GoldenGate Extract Process Hang Problem Solving (Database)

- Java data structures - order linear form of table ArrayList (Programming)

- Security: Unix operating system intrusion tracking Strikes Back (Linux)

- Compile and install GCC 4.8.1 + GDB 7.6.1 + Eclipse in CentOS 6.4 in (Linux)

- Xmanager Remote Desktop login CentOS 6.5 (Linux)

- Boot automatically remove Linux operating history records (Linux)

- namespace mechanism Linux kernel analysis (Linux)

- Oracle database NUMBER (x, y) data types (Database)

- JIRA 3.6.2 Upgrade from older version to the new version 6.0.8 (Linux)

- DBCA Error: ORA-19809: limit exceeded for recovery files process (Database)

- The difference between Linux su and sudo commands (Linux)

- RHEL6.5 install the latest version of Vim and increase support for the Python2.7.5 (Linux)

- CentOS 6.0 system security level (Linux)

- Using PHP MySQL library (Programming)

- CentOS boot image to achieve semi-automatic installation (Linux)

- How to use OpenVPN and PrivacyIDEA build two-factor authentication for remote access (Server)

- Why learn and use C language (Programming)

 
         
  Availability Hadoop platform - Oozie Workflow
     
  Add Date : 2018-11-21      
         
       
         
  1 Overview

In the development of related applications using Hadoop in business is not complicated, much task, we can use Crontab to complete the scheduling related applications. Today to introduce the unified management of various scheduling tasks system, following today's shared directory:

Introduction
Oozie Server
Screenshot Preview
Let's start today's content sharing.

2. Introduction

Today's content does not involve Oozie specific details of the operation, its workflow in the next blog to find out more. Today, the main role of Oozie to share the contents of its integration steps and the like.

2.1 Role

Oozie It is an open source workflow scheduling system that can manage a plurality of logical complex Hadoop job tasks, in the specified order to carry out its work. For example, our daily work scenarios:

Collecting data to HDFS
MR prepared to clean data, generate new data stored in HDFS path specified under
Create Hive table partitions, and load the data into the corresponding table partition
HQL for business use statistical indicators and statistical outputs the result to the corresponding Hive among large table
After statistical data among a large table for data export total to call outside business use
By the above routine work processes, we can write workflow system generates a workflow instance, and then regularly every day to go running instance. For such a Hadoop application scenarios, Oozie can simplify our task scheduling and execution.

2.2 based environment

The basic environment to share is:

Name Value
OS CentOS6.6
Workflow Oozie4.2
Hadoop 2.6


 

 

 

The above is the basis for this blog need to rely on the environment. Also you need to use JDK, Maven, MySQL drivers and so on.

3.Oozie Server

Oozie Server can provide us with a convenient Job management functionality to manage Job running state through its visual interface, of course, support the build complex Hadoop Job processes, dependencies among Job between can be configured through the workflow by Oozie unified Server to perform.

3.1 dependencies ready

Maven
Download and install Maven environment, the command is as follows:

wget http://mirrors.hust.edu.cn/apache/maven/maven-3/3.3.3/binaries/apache-maven-3.3.3-bin.tar.gz

tar -zxvf apache-maven-3.3.3-bin.tar.gz
Then add the environment variable as follows:

export M2_HOME = / home / hadoop / maven-3.3.3
export PATH = $ PATH: $ ES_HOME / bin
Then enter the following command to make it effective immediately:

. / Etc / profile
Finally, we enter mvn -version command, if the reality corresponding Maven version number, it means that Maven environment integrated OK.

MySQL
About MySQL database installation and configuration is relatively simple, it does not go into details here.

Tomcat
Since Oozie will use its Web container where you need to install Tomcat Web server, Apache official website to download the corresponding installation package, there is not much to do repeat.

ExtJS toolkit
Will depend on the visualization toolkit, so here we need to download the tool Download Oozie we can find the DG_QuickStart page, as shown below:



Address as follows:

wget http://dev.sencha.com/deploy/ext-2.2.zip
Oozie
Here we can download the installation package, download address in Oozie's official website:

wget http://mirrors.cnnic.cn/apache/oozie/4.2.0/oozie-4.2.0.tar.gz
3.2 Oozie Integration

When you are ready environment, the next, we went integration Oozie. First, we will download the installation package of Oozie decompress it, then, be packaged using Maven command. Use the command as follows:

# Unzip
tar -zxvf oozie-4.2.0.tar.gz

# Enter
cd oozie-4.2.0

# Bale
mvn clean package assembly: single -DskipTests
Note: This requires pom file is modified, the JDK, version number Hadoop, HBase, Hive and other components of a unified, can be consistent with the version number you are using.

Resulting path address is as follows:

/home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0
At this time, we modify Oozie environment variable as follows:

export OOZIE_HOME = / home / hadoop / oozie-4.2.0 / distro / target / oozie-4.2.0-distro / oozie-4.2.0
export PATH = $ PATH: $ OOZIE_HOME / bin
Next, we create a folder to hold the ExtJS and Hadoop JAR files under $ OOZIE_HOME directory, a JAR file Here we downloaded earlier ExtJS archive and the Hadoop Share dirctory to libext folder can be. Because, we use MySQL to store Oozie metadata, so the need to use MySQL driver package, and therefore, we need to copy the MySQL driver package to the next libext directory.

After preparing these finished, let's start the installation, the command is as follows:

# Enter $ OOZIE_HOME / bin directory
./oozie-setup.sh prepare-war
Generates the following message, indicating success, the content is as follows:

[Hadoop @ nna bin] $ ./oozie-setup.sh prepare-war
  setting CATALINA_OPTS = "$ CATALINA_OPTS -Xmx1024m"

INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/commons-configuration-1.6.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-auth-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-common-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-common-2.6.0-tests.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-hdfs-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-hdfs-2.6.0-tests.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-hdfs-nfs-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-app-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-common-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-core-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-hs-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-hs-plugins-2.6.0. jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-jobclient-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-jobclient-2.6.0-tests. jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-client-shuffle-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-mapreduce-examples-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-nfs-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-api-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-applications-distributedshell-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-applications-unmanaged-am-launcher-2.6. 0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-client-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-common-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-registry-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-applicationhistoryservice-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-common-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-nodemanager-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-resourcemanager-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-tests-2.6.0.jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/hadoop-yarn-server-web-proxy-2.6.0. jar
INFO: Adding extension: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/libext/mysql-connector-java-5.1.32-bin.jar

New Oozie WAR file with added 'ExtJS library, JARs' at /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/webapps/oozie. war


INFO: Oozie is ready to be started
If you fail, you can do the processing corresponding to the prompts.

Thus, we at /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/webapps directory is generated oozie.war file.

3.3 Oozie

We modified oozie-site.xml file in the $ OOZIE_HOME / conf directory, the contents are as follows:

< Property>
        < Name> oozie.service.JPAService.jdbc.driver < / name>
            < Value> com.mysql.jdbc.Driver < / value>
         < Description> JDBC driver class. < / Description>
    < / Property>
    < Property>
         < Name> oozie.service.JPAService.jdbc.url < / name>
         < Value> jdbc: mysql: // nna: 3306 / oozie < / value>
         < Description> JDBC URL. < / Description>
    < / Property>
    < Property>
        < Name> oozie.service.JPAService.jdbc.username < / name>
         < Value> root < / value>
         < Description> DB user name. < / Description>
    < / Property>
    < Property>
         < Name> oozie.service.JPAService.jdbc.password < / name>
         < Value> root < / value>
         < Description> DB user password. < / Description>
< / Property>
Here we use to manually create the database, oozie.service.JPAService.create.db.schema property if it is true, it means to be created automatically. Create a script manually as follows:

CREATE DATABASE oozie;
. GRANT ALL ON oozie * TO 'root' @ 'nna' IDENTIFIED BY 'root';
FLUSH PRIVILEGES;
We then use the following command to generate the data table:

# In $ OOZIE_HOME / bin directory operations
./ooziedb.sh create -sqlfile oozie.sql -run
Generates the following:

[Hadoop @ nna bin] $ ./ooziedb.sh create -sqlfile oozie.sql -run
  setting CATALINA_OPTS = "$ CATALINA_OPTS -Xmx1024m"

Validate DB Connection
DONE
DB schema does not exist
Check OOZIE_SYS table does not exist
DONE
Create SQL schema
DONE
Create OOZIE_SYS table
DONE

Oozie DB has been created for Oozie version '4.2.0'


The SQL commands have been written to: oozie.sql

If executed properly, will generate oozie.sql script

3.4 Start

Next, we went to start Oozie, the command is as follows:

# In $ OOZIE_HOME / bin directory
./oozie-start.sh
Normal start something like this:

[Hadoop @ nna bin] $ ./oozie-start.sh
WARN: Use of this script is deprecated; use 'oozied.sh start' instead

Setting OOZIE_HOME: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0
Setting OOZIE_CONFIG: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/conf
Sourcing: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/conf/oozie-env.sh
  setting CATALINA_OPTS = "$ CATALINA_OPTS -Xmx1024m"
Setting OOZIE_CONFIG_FILE: oozie-site.xml
Setting OOZIE_DATA: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/data
Setting OOZIE_LOG: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/logs
Setting OOZIE_LOG4J_FILE: oozie-log4j.properties
Setting OOZIE_LOG4J_RELOAD: 10
Setting OOZIE_HTTP_HOSTNAME: nna
Setting OOZIE_HTTP_PORT: 11000
Setting OOZIE_ADMIN_PORT: 11001
Setting OOZIE_HTTPS_PORT: 11443
Setting OOZIE_BASE_URL: http: // nna: 11000 / oozie
Setting CATALINA_BASE: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server
Setting OOZIE_HTTPS_KEYSTORE_FILE: /home/hadoop/.keystore
Setting OOZIE_HTTPS_KEYSTORE_PASS: password
Setting OOZIE_INSTANCE_ID: nna
Setting CATALINA_OUT: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/logs/catalina.out
Setting CATALINA_PID: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/temp/oozie.pid

Using CATALINA_OPTS: -Xmx1024m -Dderby.stream.error.file = / home / hadoop / oozie-4.2.0 / distro / target / oozie-4.2.0-distro / oozie-4.2.0 / logs / derby.log
Adding to CATALINA_OPTS: -Doozie.home.dir = / home / hadoop / oozie-4.2.0 / distro / target / oozie-4.2.0-distro / oozie-4.2.0 -Doozie.config.dir = / home / hadoop /oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/conf -Doozie.log.dir = / home / hadoop / oozie-4.2.0 / distro / target / oozie- 4.2.0-distro / oozie-4.2.0 / logs -Doozie.data.dir = / home / hadoop / oozie-4.2.0 / distro / target / oozie-4.2.0-distro / oozie-4.2.0 / data -Doozie.instance.id = nna -Doozie.config.file = oozie-site.xml -Doozie.log4j.file = oozie-log4j.properties -Doozie.log4j.reload = 10 -Doozie.http.hostname = nna -Doozie .admin.port = 11001 -Doozie.http.port = 11000 -Doozie.https.port = 11443 -Doozie.base.url = http: // nna: 11000 / oozie -Doozie.https.keystore.file = / home / hadoop / .keystore -Doozie.https.keystore.pass = password -Djava.library.path =

Setting up oozie DB
  setting CATALINA_OPTS = "$ CATALINA_OPTS -Xmx1024m"

Validate DB Connection
DONE
DB schema exists

The SQL commands have been written to: /tmp/ooziedb-9100396876446618885.sql

Using CATALINA_BASE: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server
Using CATALINA_HOME: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server
Using CATALINA_TMPDIR: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/temp
Using JRE_HOME: /usr/java/jdk1.7
Using CLASSPATH: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/bin/bootstrap.jar
Using CATALINA_PID: /home/hadoop/oozie-4.2.0/distro/target/oozie-4.2.0-distro/oozie-4.2.0/oozie-server/temp/oozie.pid
4. Screenshot Preview

Then we can enter the browser address to see if normal start

5. Summary

About Oozie integration will be a little tedious, but this blog for individual Oozie to integrate the follow-up blog will introduce Oozie integrated into the Hadoop cluster, and usage-related workflow introduction.

6. Conclusion

This blog will share here, if you have any questions in the study of the process of learning, can be added to the group discussion or send mail to me, I will do my best to answer your questions, and the king of mutual encouragement!
     
         
       
         
  More:      
 
- How to view information about the installed version of CentOS (Linux)
- 20 Top Linux commands (Linux)
- B-tree - ideas and implementation of C language code (Programming)
- To install the Contiki development toolchain on Ubuntu (Linux)
- Linux serial port driver test (Linux)
- CentOS 6.6 installation certification system based on the ftp service (Server)
- Spring use Cache (Programming)
- ACL permissions Linux command (Linux)
- Use FirewallD build dynamic firewall (Linux)
- Spring Integration ehcache annotation implement the query cache and cache update or delete instant (Programming)
- Using LLVM Clang and Blocks under Linux (Programming)
- STL in the list of erase () method (Programming)
- Recover Ubuntu 14.04 wakes up from standby in case mouse keyboard appears dead (Linux)
- ActiveMQ configuration Getting Started Tutorial (Server)
- CentOS7 virtual machine creation failed Solution (Linux)
- To configure linux transparent firewall (Linux)
- Make full use of the Raspberry Pi SD card space (Linux)
- CentOS x86 64bit upgrade to 2.7 Python2.6 (Linux)
- CentOS Linux Optimization and real production environment (Linux)
- Graphics of Java Tools (Programming)
     
           
     
  CopyRight 2002-2016 newfreesoft.com, All Rights Reserved.