Home IT Linux Windows Database Network Programming Server Mobile  
  Home \ Server \ Hadoop - Task Scheduling System Comparison     - impdp error ORA-39001, ORA-39000, ORA-31619 (Database)

- MySQL Online DDL tools of pt-online-schema-change (Database)

- How to Install Puppet in the Ubuntu 15.04 (Server)

- Spring MVC Exception Handling (Programming)

- Ubuntu derivative version of the user and how to install SmartGit / HG 6.0.0 (Linux)

- Ubuntu System Log Configuration / var / log / messages (Linux)

- VMware6 achieve nat Internet (Linux)

- Extended VMware Ubuntu root partition size (Linux)

- JavaScript prototype and prototype chain and project combat (Programming)

- To_explore Linux system boot process (Linux)

- Install RAID 6 (Striping double distributed parity) (Linux)

- Linux installed and tested the deployment of Kafka distributed cluster (Server)

- How to Install Focuswriter 1.4.5 (Linux)

- Share and show your code on GitHub (Linux)

- How to use GRUB2 files directly from the hard disk to run ISO (Linux)

- Linux directory permissions to read and execute permissions difference (Linux)

- The free command in Linux (Linux)

- How to install Zephyr Test Management Tools on CentOS 7.x (Server)

- Fatal: unable to connect to github.com problem solving (Linux)

- Awk include binding capacity larger than the specified size of all files directory (Linux)

  Hadoop - Task Scheduling System Comparison
  Add Date : 2018-11-21      
  1 Overview

In Hadoop application, along with iterative business metrics, and make it increasingly complicated when managing Hadoop related applications will become a difficult thing, such as: depend on job scheduling, monitoring the operation of the task, the unusual problem investigation, these problems will be our daily work becomes complicated. Then, under no condition and effort to develop a scheduling system, we go to choose a third-party open source scheduling systems to minimize and reduce the complexity of our daily work, and it is excellent. Today, I give you the comparison of several common scheduling system for everyone to choose.

2. Content

2.1 Oozie

Oozie is currently hosted at the Apache Foundation, an open source. In the previous blog "Oozie scheduling," a text which describes the relevant Oozie schedule, how to dispatch Hadoop-related, we can content from the blog of the herein described see that the configuration process slightly cumbersome and complex, configuration-related scheduling the task is too much trouble, however, its visual interface is not so intuitive addition, UI interface demanding students, the schedule system is expected to bring you down. If the students of interest may change the scheduling system to "Oozie scheduling," the article details do understand. I do not go into details here.

2.2 Zeus

It is a platform for Hadoop job, from commissioning to run Hadoop tasks periodic schedule production tasks, it supports the entire life cycle of the task. From the functional point of view, it supports the following tasks:

Hadoop's MapReduce task scheduling run
The task is scheduled to run Hive
Run Shell tasks
Visualize query Hive metadata and data preview
Automatic scheduling Hadoop tasks
Its source address Github above, in Github search Zeus, to find related projects. Zeus is an open source by Ali Baba out of the document on Github also described in more detail, its associated installation procedures and use can refer to the official documentation on Github, it does not go into details here.

2.3 Azkaban

This is a batch workflow created by the LinkedIn for running the Hadoop Jobs. Azkaban provides an easy to use user interface to track and maintain your workflow.

In addition, the amount of the contribution of source code on Github of Azkaban scheduling system is not, to do secondary development difficult. Its function involves the following points:

Compatible version of Hadoop
To-use Web UI
Simple Web Upload and Http Workflow
Project Workspace
Workflow scheduling
The modular technology and plug-ins
Authentication and Authorization
User Behavior Tracking
Email alerts failure and success
SLA alert
Restart failed Jobs
Azkaban is the beginning of the design is mainly based on availability considerations. In some years LinkedIn operation, and has been driving their Hadoop and data warehouse.

It consists of three key components, namely:

Relations line database (MySQL): Azkaban use MySQL do store some states. AzkabanWebServer and AzkabanExecutorServer Both services require access to the library DB them.
AzkabanWebServer: Use DB reasons WebServer as follows:
Project Management: Project management of permissions and upload files.
Execution state of the process: the program is being executed for tracking purposes.
Prior processes or Jobs: Search by previous work and processes to access their log files.
Scheduler: maintaining a predetermined operating condition.
SLA: Keep all the SLA rules.
AzkabanExecutorServer: Further, as shown in ExecutorServer use DB reasons:
Get Project: Project files retrieved from the database.
Execution of the workflow or Jobs: retrieve and update data stream and execute.
Logs: store job output log, and flows into the database.
Different dependencies communicate: If a stream on a different actuator running, it will take the state to take from the database.

On its configuration and use of the official documents given in more detail, here is not to go into details. You can go to Github read the official document given.

3. Summary

About selective scheduling system, there is more of the three, you can fit the case may be, in addition, if conditions permit, or have the energy can also refer to the principles of these scheduling system, develop a scheduling system to meet their current business, It may well be an option.

4. Conclusion

This blog will share here, if you have any questions in the study of the process of learning, can be added to the group discussion or send mail to me, I will do my best to answer your questions, and the king of mutual encouragement!
- How to install open source ITIL portal iTOP on CentOS 7 (Server)
- Linux mount command Detailed (Linux)
- Linux Systems Getting Started Learning: Configuration PCI passthrough on a virtual machine (Linux)
- Math objects easily overlooked but very convenient method --JavaScript (Programming)
- Nonstandard IMP-00010 error processing one case (Database)
- These days have been tossing in the Linux under the ASP.NET 5, on the next in the other operating systems in the ASP.NET 5 or. NET applications, in order to complete the MS VM (CoreCLR) run is not far Reach, the effect of the application.

- Setting Squid successful anti-hotlinking (Linux)
- Five Linux user space debugging tool (Linux)
- How to create a new file system / partitions under Linux terminal (Linux)
- CentOS6 MongoDB connection solution can not break 1000 (Database)
- Repair after installing Ubuntu no boot device error (Linux)
- Android will save the picture to see the album and timely (Programming)
- EXP-00091 Error resolved (Database)
- Use SecureCRT to transfer files between Linux and Windows (Linux)
- Arduino UNO simulation development environment set up and run simulation (Linux)
- Mac OS X systems create Ubuntu USB boot disk for the Mac (Linux)
- MySQL partition summary (Database)
- Ubuntu terminal command - see the port occupied and off (Linux)
- Ubuntu 14.04 modify environment variables (Linux)
- Oracle how to assess the true concurrent session (Database)
  CopyRight 2002-2016 newfreesoft.com, All Rights Reserved.