In this article, we will discuss when a disk fails and how to rebuild the software RAID array without losing data. For your convenience, we consider only RAID 1 configuration - but its methods and concepts applicable to all situations.
RAID testing program
Before going any further, make sure you have configured the RAID 1 array, you can follow the process of this series Part 3 provides: How to Create a RAID 1 (mirrored) in Linux.
In the present case, the only changes are:
Different versions CentOS (v7), instead of the previous article (v6.5).
Disk capacity changes, / dev / sdb and / dev / sdc (each 8GB).
In addition, if SELinux is set to enforcing mode, you need to add the appropriate tag mounted RAID device directory. Otherwise, when you try to mount, you will encounter this warning message:
RAID mount errors SELinux enabled
By the following command to solve:
# Restorecon -R / mnt / raid1
Configuring RAID monitoring
Storage equipment damage many reasons (although SSDs greatly reduce the likelihood of this happening), but for whatever reason, to be sure the problem can occur at any time, you need to be ready to replace the failed section and ensure data availability and integrity.
First, we propose Yes. Although you can view the / proc / mdstat to check the status of the RAID, but there is a better and time-saving method, using the scan mode to monitor + mdadm, it will send an alert to a predefined recipients via email.
To set this, add the following line in /etc/mdadm.conf:
MAILADDR user @ < domain or localhost >
My own set as follows:
MAILADDR gacanepa @ localhost
RAID monitoring and alarm e-mail
Let mdadm run monitor + scan mode, the root user to add the following crontab entry:
@ Reboot / sbin / mdadm - monitor --scan --oneshot
By default, mdadm will check every 60 seconds RAID array, if you find the problem alerts. You can add --delay option to crontab entry above, followed by the number of seconds to modify the default behavior (for example, - delay 1800 mean 30 minutes).
Finally, make sure you have installed a mail user agent (MUA), such as mutt or mailx. Otherwise, you will not receive any alert.
In a minute, we'll see an alert mdadm sent.
Failed RAID storage device emulation and Replace
In order to simulate storage devices in a RAID array fails, we will use --manage and --set-faulty option, as follows:
# Mdadm - manage --set-faulty / dev / md0 / dev / sdc1
This will lead to / dev / sdc1 is marked as faulty, we can / proc / mdstat see:
Simulation questions on RAID storage device
More importantly, let's see if it is received the same warning message:
RAID send mail alerts when equipment failure
In this case, you need to remove the device from a software RAID array:
# Mdadm / dev / md0 --remove / dev / sdc1
Then, you can take directly from the machine and use it to replace the backup device (/ dev / sdd of type fd partition was previously created):
# Mdadm - manage / dev / md0 --add / dev / sdd1
Fortunately, the system will automatically rebuild the array using a disk we just added. We can conduct tests for faulty by mark / dev / sdb1, removed from the array, and make sure the file is still in tecmint.txt / mnt / raid1 is accessible:
# Mdadm - detail / dev / md0
#mount | grep raid1
# Ls-l / mnt / raid1 | grep tecmint
# Cat / mnt / raid1 / tecmint.txt
Confirm RAID rebuild
The above picture clearly shows, add / dev / sdd1 to the array instead of / dev / sdc1, reconstruction of data is done automatically and does not require intervention.
Although not very demanding, there is a backup device is a good idea to replace the faulty equipment can be completed in an instant. To do this, let us re-add / dev / sdb1 and / dev / sdc1:
# Mdadm - manage / dev / md0 --add / dev / sdb1
# Mdadm - manage / dev / md0 --add / dev / sdc1
Raid on behalf of the failed equipment
Recover data from the redundancy loss
As mentioned earlier, when a disk fails, mdadm will automatically rebuild data. But what would happen if the array is two disks failures? Let us simulate this situation, marked by / dev / sdb1 and / dev / sdd1 is faulty:
# Umount / mnt / raid1
# Mdadm - manage --set-faulty / dev / md0 / dev / sdb1
# Mdadm - stop / dev / md0
# Mdadm - manage --set-faulty / dev / md0 / dev / sdd1
Now try to re-create the array in the same way it (or use --assume-clean option) may result in data loss, so less than a last resort do not use.
Let us try to recover data from / dev / sdb1, for example, in a similar disk partition (/ dev / sde1 - Note that this requires you to perform before creating a fd type on / dev / sde partition) on the use ddrescue:
# Ddrescue -r 2 / dev / sdb1 / dev / sde1
Recovery Raid Array
Please note that, up to now, we have not touched / dev / sdb and / dev / sdd, their partition is part of a RAID array.
Now, let's use / dev / sde1 and / dev / sdf1 to rebuild the array:
# Mdadm - create / dev / md0 --level = mirror --raid-devices = 2 / dev / sd [e-f] 1
Note that in a real situation, you need to use the original name after the name of the array in the same device, that device failed disk should be replaced / dev / sdb1 and / dev / sdc1.
In this article, I chose to use additional equipment to re-create a new disk array, in order to avoid confusion with the original failed disk.
When asked whether to continue to write the array, type Y, and then press Enter. Array is activated, you can view its progress:
# Watch-n 1cat / proc / mdstat
When this process is complete, you should be able to access RAID data:
Data confirm Raid
to sum up
In this article, we review the recovery from RAID failures and loss of redundancy in the data. But you have to remember that this technology is a storage solution, not a substitute for backup.
The method described in this article apply to all RAID in which concepts I will cover it in the last article in this series (RAID Management).