Heartbeat is a Linux-based open source high-availability cluster system. Including the heartbeat of services and resources to take over the two high-availability cluster components. Heart monitoring services through a network link and a serial port, and support redundant links to send messages to each other to tell each other their current state between them, if the other party does not receive packets sent within a specified time, then that the other side would fail, then the need to start to take over the resources to take over running the module on a host of other resources or services. This article briefly describes the heartbeat v2 cluster architecture components and related concepts, for your reference.
First, the characteristics of high availability cluster
High Availability Services
Usually cluster ways, this is the largest cluster of the role and expression.
Its ultimate goal is to ensure that the service is available in real time, not because of any hardware or software failure occurs resulting in the termination of services and situations unavailable.
System reliability (reliability), and serviceability (maintainability) to measure.
Engineering, usually with a mean time to failure (MTTF) to measure the reliability of the system, with a mean time to repair (MTTR) to measure the maintainability of the system.
Formula, HA = MTTF / (MTTF + MTTR) * 100%
99% of the downtime of no more than 4 days
99.9% a downtime of no more than 10 hours
99.99% downtime of no more than 1 hour
99.999% downtime of no more than 6 minutes
Cluster software must include a mechanism to define which systems can be used as cluster nodes (as defined node, node 2 or more).
All located in a cluster host called nodes.
Cluster Services and Resources
What service or application can failover between nodes, communication and interconnection can be transferred between nodes.
Services typically include a variety of resources, more resources form a service.
Mysql services such as high availability, the vip, compared with the resources required to service mysqld, shared or mirrored disks.
Management of the cluster service, in fact the management of resources.
Resource isolation and brain split
Due to hardware failure caused downtime node resource contention that the case failed node or normal coexistence appears.
In the case of the control of the same cluster resource node failure, the implementation of resource isolation to prevent split brain occurs (Fence mechanism, STONITH etc.).
Cluster Status Monitor
To configure common services or applications through the cluster management and monitoring tools as well as predefined scripts, monitoring, and failover.
Well known as a heartbeat, mainly used in a clustered environment of mutual perception of each other's existence between the nodes.
It may be based on the serial port, multicast, broadcast and multicast communication mechanism. Once heart failure, the corresponding transfer of resources, cluster refactoring action occurs.
Two, HeartBeat assembly
Heartbeat is a Linux-based open source high-availability cluster system. Including the heartbeat of services and resources to take over the two high availability cluster assembly major version change is divided into three stages.
1, Heartbeat 1.x components
Heartbeat1.x allows cluster nodes and resources via directory /etc/ha.d following two files to configure
ha.cf: define the cluster node failure detection and switching time interval, the cluster node Fence time logging mechanism and method
Define the cluster resource group, each line can define a default node and a failover resource group together resources, including IP addresses, file systems, services or applications
2, Heartbeat 2.x components
Heartbeat 2.0 in Heartbeat1.x basis based configuration introduces configuration module structure, the cluster resource manager (Cluster Rescource Manager-CRM).
CRM model can support up to 16 nodes, this model uses XML-based cluster information (Cluster Information Base-CIB) configuration.
Heartbeat 2.x official last STABLE release 2.x version is 2.1.4.
CIB file (/var/lib/heartbeat/crm/cib.xml) is automatically replicated across each node, which defines the following objects and actions:
* Cluster nodes
* Cluster resources, including property, priority, and dependency group
* Logging, monitoring, arbitration and standard fence
* Action when the service fails or which meet the standard set, you need to perform
Heartbeat 2.x component diagram
Messaging and Infrastructure Layer (Messaging and Infrastructure Layer)
The first or primary layer is a messaging / infrastructure layer, also known as a heartbeat layer. #Author: Leshami
This layer contains the sender containing "I am alive" heartbeat information signal component and other information.
Heartbeat program resides in the message / infrastructure layer. #Blog: Http: //blog.csdn.net/leshami
Members of the layer (Membership Layer)
Members to obtain information from the underlying layer that is the heartbeat layer, is responsible for the largest computing cluster node is fully connected and synchronized to all members of the set on the node.
This layer is responsible for the consistency of the cluster members, cluster topology to provide a layer assembly.
Resource allocation layer (Resource Allocation Layer)
The third layer is a layer of resource allocation. This layer is the most complex and consists of the following components:
Cluster Resource Manager (Cluster Resource Manager)
In each layer is allocated a resource management action by the cluster resource manager.
Any of the components of the resource allocation layer, or any other components need to communicate higher level, by local cluster Resource Manager.
On each node, cluster resource manager maintains cluster information base, or CIB (see below Cluster Information Base).
A node in the cluster will be selected as the designated coordinator (DC), which means that it has the master CIB. Cluster all other CIB is a copy of the master CIB.
Of CIB normal read and write operations are serialized by the master CIB.
In a cluster, DC can determine a range of changes in cluster-related changes need to be performed, such as quarantining a node or moving resources.
Cluster Information Base (Cluster Information Base)
Cluster repository or CIB is the entire cluster configuration and status, including node membership, resource constraints, the XML file is a memory-resident.
In a cluster, a master CIB maintained by the DC, all other nodes contain a CIB replica.
If administrators want to manage a cluster, you can use the command line tools or cibadmin heartbeat GUI tool.
heartbeat GUI tool can be used from any machine connected to the cluster.
cibadmin command must be on the cluster nodes, and is not limited to only the DC node.
Policy engine and conversion engine (Policy Engine (PE) and Transition Engine (TE))
Whenever the designated coordinator needs to change cluster-wide (Reconstruction of the New CIB), a policy engine for the next state and the (resource) cluster computing to achieve its list need to operate.
Calculated by the policy engine commands are then executed by the conversion engine.
DC cluster resource manager will send the relevant information, and then use their own local resource manager (LRM), the necessary resources to operate.
PE and TE must be paired to run on DC node.
Local Resource Manager LRM (Local Resource Manager)
Local resource manager calls the local Resource Broker on behalf of CRM. So it can perform start / stop / monitor operations and report the results to CRM.
LRM is reserved on the local node to all resource-related information.
Resource layer (Resource Layer)
The fourth and highest level is the resource layer. Resource layer includes one or more Resource Agents (RA).
Resource agent is a program, usually a shell script that contains start, stop, and monitor a service (resource).
The most common resource agent is LSB init scripts. However, HeartBeat also supports more flexible and powerful open-architecture cluster resource agent API.
Heartbeat provides proxy is written OCF specification. Resource agents invoked only by the local resource manager.
Third parties can define its own agents in the file system, integrate their software into the cluster.
3, Heartbeat 3.x components
After v3 version, the entire project heartbeat function split into different sub-projects to be developed separately. However, HA and Heartbeat2.x principle is basically the same, the configuration is basically the same. After v3 version, split into heartbeat, pacemaker (cardiac pacemakers), cluster-glue (cluster bonding device), separated from the architecture, it can work in conjunction with other components.
Heartbeat 3 official release of the first version is 3.0.2. Original before CRM management to replace the pacemaker, the underlying message layer can still use heartbeat v3 can also use corosync like. This article does not describe the specific details, reference can be individually clusterlabs.org.
Three, heartbeat cluster processing
Arbitrary behavior in a cluster that will result in changes to the entire cluster. These actions include things like adding or removing cluster resource or changing resource constraints. When performing such operations, it is important to understand what happens in the cluster.
For example, suppose you need to add a cluster IP address resources. To do this, cibadmin command line tools or Heartbeat GUI tool used to modify the master CIB. It does not require the use of cibadmin command or GUI tools on a designated coordinator. Change you can use any tool on any node in the cluster, the local CIB will replay request to the designated coordinator. Coordination will be copied and then specify the CIB change to all cluster nodes, and start the conversion process.
With the help of the policy engine and a transition engine, the need to complete a series of steps specified in the cluster coordinator obtained, it is possible to step on multiple nodes. Designated coordinator sends commands to the other cluster resource management through the message layer.
If desired, other cluster resource manager use their local resource manager to perform modify the resource and returns the result to the designated coordinator. Once you have specified on the coordination of TE in the cluster to deduce that all necessary operations have been successfully completed, the cluster will return to the idle state and wait for further events.
If no action and did not go according to plan, the new information is recorded in the CIB the policy engine called again.
When a service node or death, the same thing will happen. Designated coordinator will be consistent cluster membership service (in a node death) or local resource management notification (in case of failure of the monitor operation). Designated coordinator will need to decide you want to change to a new state of the cluster behavior. The new state of the cluster will be a new CIB representation.