hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Golby <sgo...@conductor.com>
Subject RE: HDFS drive, partition best practice
Date Mon, 07 Feb 2011 22:40:05 GMT

> The big issues you will encounter is losing a disk - the DataNode process will crash,
and if you comment out the affected drive,
> when you replace it you will have 9 disks full to N% and one empty disk.  
> The DFS balancer cannot fix this - usually when I have data nodes down more than an hour,
I format all drives in the box and rebalance.

Yeah this bites us when we add a disk, love getting monitors going off for "disk 90% full"
when you've got the new disk at <10%.  We've tried a few tricks moving the reserved blocks
up to force 'balance' it but it's pretty ineffective by and large.

>> but if the loss of a single drive necessitated rebuilding an entire node, and therefore
being down in capacity during that period, 
>> just doesn't seem to be the most efficient approach

This bit about rebuilding the entire node isn't true, that's just Jonathan's choice to wipe
the node & an interesting one it is (we might consider that for our small cluster).  Lose
a disk & you lose just the capacity of that disk from the entire pool of space in the

1 out of 3 copies of *some* of the HDFS blocks go away, not the entire nodes blocks, usually
this wouldn't be very much of a loss (typical 4 disk boxes, x XYZ boxes = quite a few disks).
 The 1 missing replica will likely be re-copied (I often say re-built, but that's RAID) before
you put the new disk in, but say somehow you were 100% full, you'd add the new disk and the
blocks which were in a 2 copies/replica state would copy themselves a 3rd time.  (the lack
of inter-node disk balance is an issue again here)

> We are building a new cluster aimed primarily at storage - we will be using SuperMicro
4U machines 
> with 36 2TB SATA disks in three RAID6 volumes (for roughly 20TB usable per volume, 60

I really like the SuperMicro cases for big disk boxes.  What are you using to run the 36 disks
all at once ?

Scott Golby

View raw message