|
Say that Linux users are most reluctant to see things, than was found in the case of a hard drive crash without warning. Such as RAID backup and storage technology can help at any time users to recover data, but in order to prevent hardware sudden collapse caused by the cost of data loss takes is considerable, especially in the user never advance considered to deal with in these cases measures.
To avoid encountering this dilemma, you can try a program called the smartmontools package, through the use of self-monitoring (Self-Monitoring), analysis (Analysis) and reporting (Reporting) three techniques (abbreviated as SMART or SMART) to manage and monitor the storage hardware. Today, most of the ATA / SATA, SCSI / SAS and SSDs are equipped with built-in SMART system. The purpose is to monitor the SMART drive reliability, predict disk failure and perform various types of disk self-test. smartmontools by the smartctl and smartd two parts consisting of utility, which together provide a disk degradation and failure of advanced warning for the Linux platform.
This article will describe the installation and configuration of Linux on smartmontools.
Installation Smartmontools
Since smartmontools are available in most Linux distributions basic software library, so the installation is very convenient.
Debian and its derivative version:
# Aptitude install smartmontools
Red Hat-based distributions:
# Yum install smartmontools
Use Smartctl test drive health
First, use the following command to list disk and the system is connected to:
# Ls -l / dev | grep -E 'sd | hd'
Wherein sdX representatives allocated to the corresponding device name on the hard disk on the machine.
If you want to show a given hard disk information (such as device model, S / N, firmware version, size, ATA version / revision number, SMART function of the availability and status), when you run the command to add smartctl "--info" option and press the hard disk shown in the following designated device name.
In this example, select / dev / sda.
# Smartctl --info / dev / sda
Although the beginning may not notice ATA (Translator's Note: The hard disk interface technology) version information, but when it really need to replace the hard disk is one of the most important factors. Each generation ATA versions are backward compatibility. For example, older ATA-1 or ATA-2 devices are functioning on the ATA-6 and ATA-7 interface, but not vice versa. In the case of device version and an interface version they do not match, they will be run in accordance with both the smaller version specification. That is, in this case, you need to replace the hard drive, ATA-7 hard drive is the safest option.
You can use this command to test a disk of Health:
# Smartctl -s on -a / dev / sda
In this command, "- s on" flag is turned SMART function specified device. If / dev / sda is turned on SMART support, then omit it.
HDD SMART information contains many parts. Wherein, "READ SMART DATA" section shows the hard disk's overall health.
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment rest result: PASSED
The results of this test is PASSED or FAILED. The latter indicates impending hardware failure, it is necessary to start this back up important data on the disk!
Basically, SMART attribute table lists the good property values, and these attributes fault threshold defined by the manufacturer on the hard disk. This table is automatically created and updated by the drive firmware.
ID: Property ID, usually a decimal or hexadecimal number from 1 to 255.
ATTRIBUTE_NAME: hard disk manufacturer-defined attribute name.
FLAG: Property operation flag (can be ignored).
VALUE: This is one of the most important information in the table, representing the normalized value of a given property, between 1-253. 253 means the best case, worst case 1 means. Depend on the nature and manufacturer VALUE initialization can be set to 100 or 200.
Minimum VALUE recorded: WORST.
THRESH: Before reporting hard FAILED state, WORST permissible minimum.
TYPE: Type (Pre-fail or Oldage) property. Pre-fail type of property can be seen as a key attribute, that participates in the overall SMART disk health assessment (PASSED / FAILED). If any of the Pre-fail type of property fails, it can be regarded as a disk failure will occur. On the other hand, attribute Oldage type can be seen as a non-key attributes (such as normal wear and tear of the disk), it would not make the disk itself has failed.
UPDATED: indicates that the update frequency properties. Offline Offline perform tests on behalf of the disk time.
WHEN_FAILED: If VALUE is less than equal to THRESH, will be set to "FAILING_NOW"; if WORST less THRESH is set to "In_the_past"; if not, it will be set to "-." Under "FAILING_NOW" case, you need to back up important files as soon as possible, in particular property is Pre-fail types. "In_the_past" on behalf of property has failed, but when running the test no problem. "-" Represents the property has never been a fault.
RAW_VALUE: defined by the manufacturer of the original value, derived from VALUE.
At this time you may be thinking, "Yes, smartctl seems like a good tool, but I would like to know how to avoid having to manually run." If you can run the specified intervals, while informed me the test results, it is not better? "
The good news is that this feature already. Smartd role is the time!
Configuring Smartctl real-time monitoring and Smartd
First, edit smartctl configuration file (/ etc / default / smartmontools) smartd to start at system startup, and in seconds the specified time interval (eg 7200 = 2 hours).
start_smartd = yes
smartd_opts = "- interval = 7200"
Next, edit the configuration file smartd (/etc/smartd.conf), add the following lines.
/ Dev / sda -m myemail@mydomain.com -M test
-m: Specifies to send the test report to an email address. Here it can be a system user, such as root, or if the server has been configured to send e-mail to the outside of the system, it is similar to myemail@mydomain.com mail address.
-M: Specifies the expected type of report to send mail.
once: for each disk problems detected only send a warning message.
daily: for each disk problems detected every other day to send an additional reminder warning messages.
diminishing: for each detected problems send an additional warning message reminding, starting every other day, then every two days, every five days, and so on. Each interval is two times the previous interval.
test: Just smartd a start, send a test message immediately.
exec PATH: replace the default mail command, the executable file runs PATH path. PATH must point to an executable binary file or script. When a problem is detected, you can specify to perform a desired action (blinking console, shut down the system, and so on).
Save the changes and restart smartd.
Smartd message sent should be this way.
No error is detected. If in fact an error is detected, the error will appear below "the following warning / error log daemon written by smartd 'row.
Finally, you can use the "-s" flag and shaped like a "T / MM / DD / d / HH" regular expression test is performed in accordance with the desired scheduling scheme, in which:
Regular expressions T represents the type of test:
L: long test
S: Short Test
C: transmission test (ATA Only)
O: offline testing (ATA Only)
Other characters represent date and time of the test:
MM is the month of the year.
DD is the month talent.
HH is the hour of the day.
d is a day of the week (from 1 = Monday to 7 = Sunday).
MM, DD and HH the use of two decimal digits.
In the above expression dots represent all possible values. Shaped like '(A | B | C)' expressions in parentheses represent three possible values of A, B and C in any one. The form [1-5] expression in square brackets indicate the range 1-5 (including 5).
For example, we want all of the disks in a long test every weekday afternoon a little, add the following lines to the /etc/smartd.conf in. Make sure you finish editing the restart smartd.
DEVICESCAN -s (L /../../ [1-5] / 13)
to sum up
Whether you want to quickly view the electrical and mechanical properties of the disk, the entire disk or perform a scan test for a long time, do not let yourself fall day after day Run in and forget the health status on a regular basis to detect disk. Pay more attention to the health of the disk, you'll benefit! |
|
|
|