In Hadoop application, along with iterative business metrics, and make it increasingly complicated when managing Hadoop related applications will become a difficult thing, such as: depend on job scheduling, monitoring the operation of the task, the unusual problem investigation, these problems will be our daily work becomes complicated. Then, under no condition and effort to develop a scheduling system, we go to choose a third-party open source scheduling systems to minimize and reduce the complexity of our daily work, and it is excellent. Today, I give you the comparison of several common scheduling system for everyone to choose.
Oozie is currently hosted at the Apache Foundation, an open source. In the previous blog "Oozie scheduling," a text which describes the relevant Oozie schedule, how to dispatch Hadoop-related, we can content from the blog of the herein described see that the configuration process slightly cumbersome and complex, configuration-related scheduling the task is too much trouble, however, its visual interface is not so intuitive addition, UI interface demanding students, the schedule system is expected to bring you down. If the students of interest may change the scheduling system to "Oozie scheduling," the article details do understand. I do not go into details here.
It is a platform for Hadoop job, from commissioning to run Hadoop tasks periodic schedule production tasks, it supports the entire life cycle of the task. From the functional point of view, it supports the following tasks:
Hadoop's MapReduce task scheduling run
The task is scheduled to run Hive
Run Shell tasks
Visualize query Hive metadata and data preview
Automatic scheduling Hadoop tasks
Its source address Github above, in Github search Zeus, to find related projects. Zeus is an open source by Ali Baba out of the document on Github also described in more detail, its associated installation procedures and use can refer to the official documentation on Github, it does not go into details here.
This is a batch workflow created by the LinkedIn for running the Hadoop Jobs. Azkaban provides an easy to use user interface to track and maintain your workflow.
In addition, the amount of the contribution of source code on Github of Azkaban scheduling system is not, to do secondary development difficult. Its function involves the following points:
Compatible version of Hadoop
To-use Web UI
Simple Web Upload and Http Workflow
The modular technology and plug-ins
Authentication and Authorization
User Behavior Tracking
Email alerts failure and success
Restart failed Jobs
Azkaban is the beginning of the design is mainly based on availability considerations. In some years LinkedIn operation, and has been driving their Hadoop and data warehouse.
It consists of three key components, namely:
Relations line database (MySQL): Azkaban use MySQL do store some states. AzkabanWebServer and AzkabanExecutorServer Both services require access to the library DB them.
AzkabanWebServer: Use DB reasons WebServer as follows:
Project Management: Project management of permissions and upload files.
Execution state of the process: the program is being executed for tracking purposes.
Prior processes or Jobs: Search by previous work and processes to access their log files.
Scheduler: maintaining a predetermined operating condition.
SLA: Keep all the SLA rules.
AzkabanExecutorServer: Further, as shown in ExecutorServer use DB reasons:
Get Project: Project files retrieved from the database.
Execution of the workflow or Jobs: retrieve and update data stream and execute.
Logs: store job output log, and flows into the database.
Different dependencies communicate: If a stream on a different actuator running, it will take the state to take from the database.
On its configuration and use of the official documents given in more detail, here is not to go into details. You can go to Github read the official document given.
About selective scheduling system, there is more of the three, you can fit the case may be, in addition, if conditions permit, or have the energy can also refer to the principles of these scheduling system, develop a scheduling system to meet their current business, It may well be an option.
This blog will share here, if you have any questions in the study of the process of learning, can be added to the group discussion or send mail to me, I will do my best to answer your questions, and the king of mutual encouragement!