hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: Datanode block scans
Date Thu, 13 Nov 2008 18:49:48 GMT

How often is safe depends on what probabilities you are willing to accept.

I just checked on one of clusters with 4PB of data, the scanner fixes 
about 1 block a day. Assuming avg size of 64MB per block (pretty high), 
probability that 3 copies of one replica go bad in 3 weeks is of the 
range 1e-12. In reality it is mostly 2-3 orders less probable.

Raghu.

Brian Bockelman wrote:
> 
> On Nov 13, 2008, at 11:32 AM, Raghu Angadi wrote:
> 
>> Brian Bockelman wrote:
>>> Hey all,
>>> I noticed that the maximum throttle for the datanode block scanner is 
>>> hardcoded at 8MB/s.
>>> I think this is insufficient; on a fully loaded Sun Thumper, a full 
>>> scan at 8MB/s would take something like 70 days.
>>> Is it possible to make this throttle a bit smarter?  At the very 
>>> least, would anyone object to a patch which exposed this throttle as 
>>> a config option?  Alternately, a smarter idea would be to throttle 
>>> the block scanner at (8MB/s) * (# of volumes), under the assumption 
>>> that there is at least 1 disk per volume.
>>
>> Making the max configurable seems useful. Either of the above options 
>> is fine, though the first one might be simpler for configuration.
>>
>> 8MB/s is calculated for around 4TB of data on a node. given 80k 
>> seconds a day, it is around 6-7 days. 8-10 MB/s is not too bad a load 
>> on 2-4 disk machine.
>>
>>> Hm... on second thought, however trivial the resulting disk I/O would 
>>> be, on the Thumper example, the maximum throttle would be 3Gbps: 
>>> that's a nontrivial load on the bus.
>>> How do other "big sites" handle this?  We're currently at 110TB raw, 
>>> are considering converting ~240TB over from another file system, and 
>>> are planning to grow to 800TB during 2009.  A quick calculation shows 
>>> that to do a weekly scan at that size, we're talking ~10Gbps of 
>>> sustained reads.
>>
>> You have a 110 TB on single datanode and moving to 800TB nodes? Note 
>> that this rate applies to amount of data on a single datanode.
>>
> 
> Nah -110TB total in the system (200 datanodes), and will move to 800TB 
> total (probably 250 datanodes).
> 
> However, we do have some larger nodes (we range from 80GB to 48TB per 
> node); recent and planned purchases are in the 4-8TB per node range, but 
> I'd sure hate to throw away 48TB of disks :)
> 
> On the 48TB node, a scan at 8MB/s would take 70 days.  I'd have to run 
> at a rate of 80MB/s to scan through in 7 days.  While 80MB/s over 48 
> disks is not much, I was curious about how the rest of the system would 
> perform (the node is in production on a different file system right now, 
> so borrowing it is not easy...); 80MB/s sounds like an awful lot for 
> "background noise".
> 
> Do any other large sites run such large nodes?  How long of a period 
> between block scans do sites use in order to feel "safe" ?
> 
> Brian
> 
>> Raghu.
>>
>>> I still worry that the rate is too low; if we have a suspicious node, 
>>> or users report a problematic file, waiting a week for a full scan is 
>>> too long.  I've asked a student to implement a tool which can trigger 
>>> a full block scan of a path (the idea would be able to do "hadoop 
>>> fsck /path/to/file -deep").  What would be the best approach for him 
>>> to take to initiate a high-rate "full volume" or "full datanode" scan?
>>
> 


Mime
View raw message