Home IT Linux Windows Database Network Programming Server Mobile  
  Home \ Server \ Hadoop - Task Scheduling System Comparison     - How to migrate MySQL to MariaDB under linux (Database)

- General Linux interface server parameter tuning (Server)

- Database start listening TNS-12537, TNS-12560 error (Database)

- Linux development environment to build and use the directory structure and file --Linux (Linux)

- J2EE Example of Filter (Programming)

- Hadoop 0.23 compile common errors (Server)

- Ubuntu installed Komodo editor by PPA (Linux)

- Install Firefox 28 on Ubuntu, Linux Mint (Linux)

- Linux (Ubuntu) How iptables port mapping (Server)

- Linux environment to build next Cocos2dx-3.3.1 (Linux)

- Oracle multi-table query optimization (Database)

- Spark read more HBase tables a RDD (Server)

- Java multithreading easy to confuse the concept (Programming)

- Linux LVM - File system extension (Linux)

- ORA-01000 Solution (Database)

- Mounting kit under Fedora Linux (Linux)

- Using the Android interface in Parcelable (Programming)

- Python format string (Programming)

- Linux installation notes under GAMIT (Linux)

- About AWR More Description (Database)

  Hadoop - Task Scheduling System Comparison
  Add Date : 2018-11-21      
  1 Overview

In Hadoop application, along with iterative business metrics, and make it increasingly complicated when managing Hadoop related applications will become a difficult thing, such as: depend on job scheduling, monitoring the operation of the task, the unusual problem investigation, these problems will be our daily work becomes complicated. Then, under no condition and effort to develop a scheduling system, we go to choose a third-party open source scheduling systems to minimize and reduce the complexity of our daily work, and it is excellent. Today, I give you the comparison of several common scheduling system for everyone to choose.

2. Content

2.1 Oozie

Oozie is currently hosted at the Apache Foundation, an open source. In the previous blog "Oozie scheduling," a text which describes the relevant Oozie schedule, how to dispatch Hadoop-related, we can content from the blog of the herein described see that the configuration process slightly cumbersome and complex, configuration-related scheduling the task is too much trouble, however, its visual interface is not so intuitive addition, UI interface demanding students, the schedule system is expected to bring you down. If the students of interest may change the scheduling system to "Oozie scheduling," the article details do understand. I do not go into details here.

2.2 Zeus

It is a platform for Hadoop job, from commissioning to run Hadoop tasks periodic schedule production tasks, it supports the entire life cycle of the task. From the functional point of view, it supports the following tasks:

Hadoop's MapReduce task scheduling run
The task is scheduled to run Hive
Run Shell tasks
Visualize query Hive metadata and data preview
Automatic scheduling Hadoop tasks
Its source address Github above, in Github search Zeus, to find related projects. Zeus is an open source by Ali Baba out of the document on Github also described in more detail, its associated installation procedures and use can refer to the official documentation on Github, it does not go into details here.

2.3 Azkaban

This is a batch workflow created by the LinkedIn for running the Hadoop Jobs. Azkaban provides an easy to use user interface to track and maintain your workflow.

In addition, the amount of the contribution of source code on Github of Azkaban scheduling system is not, to do secondary development difficult. Its function involves the following points:

Compatible version of Hadoop
To-use Web UI
Simple Web Upload and Http Workflow
Project Workspace
Workflow scheduling
The modular technology and plug-ins
Authentication and Authorization
User Behavior Tracking
Email alerts failure and success
SLA alert
Restart failed Jobs
Azkaban is the beginning of the design is mainly based on availability considerations. In some years LinkedIn operation, and has been driving their Hadoop and data warehouse.

It consists of three key components, namely:

Relations line database (MySQL): Azkaban use MySQL do store some states. AzkabanWebServer and AzkabanExecutorServer Both services require access to the library DB them.
AzkabanWebServer: Use DB reasons WebServer as follows:
Project Management: Project management of permissions and upload files.
Execution state of the process: the program is being executed for tracking purposes.
Prior processes or Jobs: Search by previous work and processes to access their log files.
Scheduler: maintaining a predetermined operating condition.
SLA: Keep all the SLA rules.
AzkabanExecutorServer: Further, as shown in ExecutorServer use DB reasons:
Get Project: Project files retrieved from the database.
Execution of the workflow or Jobs: retrieve and update data stream and execute.
Logs: store job output log, and flows into the database.
Different dependencies communicate: If a stream on a different actuator running, it will take the state to take from the database.

On its configuration and use of the official documents given in more detail, here is not to go into details. You can go to Github read the official document given.

3. Summary

About selective scheduling system, there is more of the three, you can fit the case may be, in addition, if conditions permit, or have the energy can also refer to the principles of these scheduling system, develop a scheduling system to meet their current business, It may well be an option.

4. Conclusion

This blog will share here, if you have any questions in the study of the process of learning, can be added to the group discussion or send mail to me, I will do my best to answer your questions, and the king of mutual encouragement!
- Management and application Oracle external table (Database)
- Using Linux command line and execute PHP code (Programming)
- Sniffer Linux Environment (Linux)
- Restore Oracle Database Cold backup and database reconstruction emca (Database)
- Linux server operating system security configuration (Linux)
- Linux netstat command to get started (Linux)
- Linux system security knowledge (Linux)
- 3 ways to create a lightweight, durable system of Ubuntu Linux USB disk (Linux)
- Use Docker containers (Linux)
- Linux top command to get started (Linux)
- RM Environment Database RMAN Backup Strategy Formulation (Database)
- Linux terminal program running in the background (Linux)
- Linux / Centos anti CC attack script (Linux)
- Ubuntu install OpenMPI (Linux)
- Introduces Linux kernel compilation system and compiler installation (Linux)
- Linux based serial programming (Programming)
- MySQL Tutorial: Building MySQL Cluster under Linux (Database)
- Experience RHEL7 new features (Linux)
- Linux Mint 17 set up the Ruby environment (Linux)
- Extended use of the swap file swap space on Linux (Linux)
  CopyRight 2002-2016 newfreesoft.com, All Rights Reserved.