Distributed Replicated Block Device (DRBD) is implemented in software, shared-nothing storage replication between servers mirrored block device content solutions. Its core functionality through Linux kernel implementation of the file system is closer than the operating system kernel and the IO stack. DRBD is a kernel module and associated scripts constituted for building high availability clusters. This is done to mirror the entire device through the network. You can think of it as a network RAID. It allows users to build a real-time image of the block device on the remote machine.
A, DRBD mirroring features and how it works
Real-time: When an application to modify the data disk, replication occurs immediately.
Transparency: data storage applications is independent and transparent on the mirror device, data can be stored on different servers.
Synchronous mirroring and asynchronous mirroring:
a, synchronous mirroring, when the local hair application for write operations, synchronous writes on both servers.
When b, asynchronous mirroring, when local write request has been completed on the local write operation, began to write the corresponding server.
file system-> buffer cache -> drbd-> disk scheduler-> disk drivers
Two, DRBD basic features
DRBD mainly on the disk resource management and control, so DRBD module, resources are available to copy all removable storage devices in general.
Resource names can be specified in addition to the space outside us-ascii any character.
DRBD virtual block device. It has a major number of the device 147, the default number of its secondary series start from zero.
The associated block device is named / dev / drbdm, where M is the minor number of the device.
DRBD internal applications requires a local copy of the data, metadata.
Contacts between the various peer data communication needs. 2, resource role
DRBD Role: primary <-> secondary
Owner: In the main DRBD device can be unrestricted read and write operations.
He used to create and mount a file system initialization or direct I / O devices faster, and so on.
Receive all updates from a peer node, it can not be applied can not be read-write access. The main objective is to maintain a buffer and data consistency.
Automatic clustering manual intervention and management programs can change the role of resources. Dominated by resource can be transformed, and the primary to the secondary. 1
Support replication to transfer data integrity verification (authentication algorithms: MD5, SHA-1, CRC-32C)
This feature for data replication process due to network transmission causes inconsistencies. DRBD for each
To copy a block to generate a checksum (digest information), is used to peer end data integrity check, if the received
Checksum source-side validation and inconsistent blocks, would require retransmission.
data-integrity-alg < algorithm>;
Online support device authentication
If we do not transfer the data verification process, we can still use online authentication device side
Formula, the principle above, we can use a regular task periodically validate the data.
By default, the line device verification is not enabled, you can add /etc/drbd.conf in the configuration file.
resource < resource>
verify-alg < algorithm>;
Verify command: drbdadm verify < resource>
Disk IO error handling strategy
Disk IO error occurred, we should use what kind of strategy?
DRBD offers three strategies, namely: detach, pass_on, call-local-io-error.
This is the default option and is recommended. If a node on the bottom of the disk I / O error, it sets the device
Running in diskless diskless mode. All of the nodes will be read from the peer node, although the decrease in performance in this case,
But can still provide services, it is clear in the case of highly available, this strategy to make our choice.
drbd will I / O error report to the upper layer. On the primary node, it will report to the mounted file system,
But on this node is often ignored (and therefore on this upper layer node no report)
Command invokes the local disk I / O handler definition. This requires a corresponding have let local-io-error
Resources handler calls the error handling commands. This gives the administrator with sufficient authority to use the free command, or feet
The call local-io-error handling I / O errors.
on-io-error < strategy>;
Expiration data processing strategy
Expired data is not inconsistent data, but said secondary data is no longer synchronized with the priamry, secondary equivalent
A snapshot, this time if the switch occurs, it can be imagined, data consistency problems arise, we need to pass
Over some strategies to prevent this from happening: When expired data, the connection status will be changed from drbd connect
Wfconnection, this time will not allow outdated data Pacemaker promoted to primary nodes
For some network status bad, if we use the protocol C to copy, then copy the data delay will be very serious,
At this time we can suspend replication strategy, so that when the network is not good when, primary side will pause replication,
primary and secondary will be in the chain of sync, when the bandwidth becomes available when copying will continue.
Synchronization rate configuration
You can configure the sync rate according to the network bandwidth or network resources and the use of temporary rate, variable rate and so on.
The maximum available bandwidth over a network sync rate settings do not have any sense.
Note that the synchronization bytes byte rate, rather than bits / s.
According to the experience of sync rate it is more reasonable 30% of the available bandwidth.
The formula: MIN (I / O subsystems, network I / O) * 0.3
Assuming that an I / O subsystem can support 180MB / s throughput, and Gigabit Ethernet can support 110MB / s,
In this case the network bandwidth becomes a bottleneck, can be calculated, sync rate: 110 * 0.3 = 33MB / s
Assuming that an I / O subsystem can support 80MB / s throughput, and Gigabit Ethernet can support 110MB / s,
In this case the disk I / O can become a bottleneck, can be calculated, sync rate: 80 * 0.3 = 24MB / s
Split brain notification and automatic recovery
split brain actually refers to, in some cases, cause two nodes drbd disconnected, both as a primary to run. If the other party is primary state, immediately disconnect itself when a primary node drbd connection destination node is ready to send information when and concluded that the current split brain has occurred, and this time he will record the following information in the system log: " Split-Brain detected, dropping connection! "after the occurrence of split brain, if you view the connection status, which will have at least one StandAlone state, the other may also StandAlone (if it is also found that the split brain state), there may be WFConnection state .
a, Discarding modifications made on the younger primary
--- After switching to the primary role of the node data will be discarded
b, Discarding modifications made on the older primary
--- First switching node to the primary role of the data will be discarded
c, Discarding modifications on the primary with fewer changes
--- Less data which node changes discarded
d, Graceful recovery from split brain if one host has had no intermediate changes
--- If there is no update node data recovery directly
drbd split brain, mainly in net configuration, there are the following keywords:
after-sb-0pri: split brain have been detected, but now none of the nodes in the primary role, for this option, drbd following keywords:
Automatic recovery is not required, just call the split-brain handler script (if configured), disconnect and out in the open mode.
Give up and roll back changes made to the above eventually became Lord.
Abandonment and rollback, modify the changes less hosts.
If any node without any changes apply only on one node to continue to make modifications.
after-sb-1pri: split brain have been detected, there is a current node is the primary role for this option, drbd following keywords:
And after-sb-0pri like to call split brain handler script (if configured), disconnect and out in the open mode.
And after-sb-0pri in the same repair strategy. If you use these strategies can harm the brain split option, that can be automatically resolved. Otherwise, the same disconnect the specified action.
And after-sb-0pri in the same repair strategy. If you use these strategies can choose a split-brain harm, it calls on the compromised node
pri-lost-after-sb program. This procedure must be confirmed in handlers in the configuration, and taking into account the removal of the nodes in the cluster.
Just in time regardless of which host roles are split harm's brain.
after-sb-2pri: When two nodes are in the main role, split brain was discovered. Second option and after-sb-1pri same keyword, discard secondary nodes and consensus.
split-brain "/usr/lib/drbd/notify-split-brain.sh root" // may be the system in an executable file.
... //e.g Split-brain "/usr/lib/drbd/notify-split-brain.sh root".;
} // As sent to the specified address by email.
after-sb-0pri discard-zero-changes; // split brain auto-repair-related policies
First you sure you want to switch as a secondary node to the top and discard the secondary resource data
drbdadm secondary < resource>
drbdadm connect --discard-my-data < resource>
In a primary node to reconnect the secondary (if the node is WFConnection current connection status, it can be omitted)
drbdadm connect < resource>
End deemed these actions, the new primary to the secondary of the re-synchnorisation will start automatically.
Three, DRBD replication mode and Copy Protocol
1 copy mode
Single Master Mode:
In single master mode, any resource there is only one master node at any given time, the cluster. Because of this in the cluster
Only one node can operate at any data, this model can be used on any filesystem (EXT3, EXT4, XFS, etc.).
Dual master mode:
In dual master mode, any resource at any given time, there are two primary cluster node. Both sides hesitated data concurrency
The possibility of this mode requires a shared cluster file system, the use of distributed lock management mechanisms, such as the GFS and
OCFS2. When deploying dual master mode, DRBD is load-balanced clusters, which need to select from the two main nodes in a concurrent preferred
Accessing data. This mode is disabled by default, If we use the words must be declared in the configuration file. (DRBD8.0 after support)
2. Copy Protocol
Once the data is written to disk and sent to the network is considered complete write operations.
When a node fails, the data may occur on a remote node may still send queue cause data loss.
Receive receipt confirmation is considered complete write operations.
Data loss may occur in the case of the simultaneous failure of two nodes under participation.
Received written confirmation that the completion of the write operation. No data is lost, mainstream configuration, I / O throughput is dependent on the network bandwidth.