hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bockelman <bbock...@cse.unl.edu>
Subject Re: Dedicated disk for operating system
Date Wed, 10 Aug 2011 19:31:54 GMT
MTTF is a difficult number.  Popular papers include: http://db.usenix.org/events/fast07/tech/schroeder/schroeder_html/index.html,
http://labs.google.com/papers/disk_failures.pdf

Ted is assuming a MTTF of 25kHours; I think that's overly pessimistic, although both papers
indicate that MTTF is a crappy way to model disk lifetime.

I think a lot has to do with the quality of the batch of hard drives you get and operating
conditions.

Brain

On Aug 10, 2011, at 2:19 PM, Luke Lu wrote:

> On Wed, Aug 10, 2011 at 10:40 AM, Ted Dunning <tdunning@maprtech.com> wrote:
>> To be specific, taking a 100 node x 10 disk x 2 TB configuration with drive
>> MTBF of 1000 days, we should be seeing drive failures on average once per
>> day....
>> For a 10,000 node cluster, however, we should expect the average rate of
>> disk failure rate of one failure every 2.5 hours.
> 
> Do you have real data to back the analysis? You assume a uniform disk
> failure distribution, which is absolutely not true. I can only say
> that our ops data across 40000+ nodes shows that the above analysis is
> not even close. (This is assuming that the ops know what they are
> doing though :)
> 
> __Luke


Mime
View raw message