Thursday, March 13, 2014

Hard disk...


I'm about to purchase a SATA hard drive. I was wondering if, aside from the storage capacity, are there any other factors I should keep in mind? I care above all about reliability. Is a more expensive drive less error-prone than a low-end one?


Consider reading Pinheiro et al (2007) Failure Trends in a Large Disk Drive Population. Proceedings of the 5th USENIX Conference on File and Storage Technologies, Feb 2007. Just search the title on google.

In general, drives of the same manufacturer are made to the same specification in terms of the disk assembly. It is usually the tolerances that differ. To give an example, if you want a paper circle of 5cm diameter, a circle of 4.5 or 5.5 cm maybe acceptable for one use (e.g. home use for decoration of child's room) but a circle of 5.0cm, add or subtract 1mm, (i.e. within 4.9 - 5.1 cm) would be required if it is a decoration project for a project launch for some big, big company.

For example, the load/unload cycle specification of a home drive may be ~ 300,000 times, the load/unload specification of an enterprise drive would be ~ 600,000 times, doubling the figures. The tighter specification also applies to the drive assembly and the disk manufacturing process - and thus the non-recoverable read error rate would be much smaller for enterprise drives, for example, a typical, current home drive - Caviar Black (from Western Digital) would have a nonrecoverable read error per 10^14 bit read. Compare with a typical harddrive manufactured towards datacenter servers WD RE SAS, which would have a nonrecoverable read error per 10^15 bit read. Whether that 10 times more reliability matters to you, is another matter.

To be honest, how you use the drive, is likely more important than which drive you use. Below is a summary of google's findings:
  • 6-7% of the drives fail within first year of use. Within which, more than half of these failing drive will fail within 6 month. These drives tends to be utilized highly during these periods.
  • Failure of the drive follow a double peak model. The first peak is within 3 months, and the second peak is around 3 years.
  • After the first year, there is in general a 8% failure rate of harddrive annually.
  • The effect of temperature is twofold: [1] The lowest failure rate is seen at disks run around 40 degree C. [2] As the drive ages, the failure rate rises expoentially with temperature at third year. To interpret this statement, running the drive at ~35C would achieve the best compromise of longevity and early failures, and if your harddrive can be replaced every 2 years, running the drives as hot as 45C in general would in fact decrease the failure rate, but past the second year there will be an exponential increase if you run it at 45C.
  • If you use SMART reporting software (a nice one is Crystal Disk Info URL: ), if you see one scan error, 10% will fail within days, and 30% of the drive will fail within 6 months. Thus, backup and discard the drive accordingly after you see the first one. If you see a reallocation event, 10% will fail within ~4 months. Note, however that only 60% of all harddrive failures would be predicted by SMART system.

Mean time between failures is basically not very useful for the typical consumers. The Mean time between failure is usually ideal and theoretical. Let's say we have 500,000 drives with MTBF of 500,000 hours - if you run each and every of them together you will likely to have one of them failing every hour, statistically speaking, if you run them within their specification (temperature, humidity, power supply quality...) With reference to the google study, the realistic useful life of a harddrive would be more like 2 years (in a non-redundant system) or 3 years (in a redundant system) - if you use it 24 hours a day - In a redundant system (e.g. a RAID-[5,6]) you can lose a harddrive without losing data. Particularly, in RAID 6 you can lose a harddrive and still have redundancy during the rebuild process.

Service life

One often see some manufacturer quoting service life such as '5 years' and then offering you a warranty of '3 years'. Translation: "We believe that it should last some 5 years. If it fails within the first three years of use, we'll replace it at our cost, but if you have it failed between 3rd and 5th year, poor you. It certainly won't be the case that we have installed some sort of time bomb to make them unusable by its fifth birthday, but you should get a new harddisk and use instead of this 5-year-old harddrive if your data is any precious."

No comments:

Post a Comment