hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Disher <jdis...@parad.net>
Subject Re: DataNode internal balancing, performance recommendations
Date Mon, 03 Jan 2011 19:55:18 GMT
The problem is, what do you define as a failure?  If the disk is failing, writes will fail
to the filesystem - how does Hadoop differentiate between permissions and physical disk failure?
 They both return error.

And yeah, the idea of stopping the datanode, removing the affected mount from hdfs-site.xml,
and restarting has been discussed.  The problem is, when that disk gets replaced, and readded,
then I have horrible internal balance issues.  Thus causing the problem I have now :(


On Jan 3, 2011, at 9:07 AM, Eli Collins wrote:

> Hey Jonathan,
> There's an option (dfs.datanode.failed.volumes.tolerated, introduced
> in HDFS-1161) that allows you to specify the number of volumes that
> are allowed to fail before a datanode stops offering service.
> There's an operational issue that still needs to be addressed
> (HDFS-1158) that you should be aware of - the DN will still not start
> if any of the volumes have failed, so to restart the DN you'll need
> you'll need to either unconfigure the failed volumes or fix them. I'd
> like to make DN startup respect the config value so it tolerates
> failed volumes on startup as well.
> Thanks,
> Eli
> On Sun, Jan 2, 2011 at 7:20 PM, Jonathan Disher <jdisher@parad.net> wrote:
>> I see that there was a thread on this in December, but I can't retrieve it to reply
properly, oh well.
>> So, I have a 30 node cluster (plus separate namenode, jobtracker, etc).  Each is
a 12 disk machine - two mirrored 250GB OS disks, ten 1TB data disks in JBOD.  Original system
config was six 1TB data disks - we added the last four disks months later.  I'm sure you can
all guess, we have some interesting internal usage balancing issues on most of the nodes.
 To date, when individual disks get critically low on space (earlier this week I had a node
with six disks around 97% full, four around 70%), we've been pulling them from the cluster,
formatting the data disks, and sticking them back in (with a rebalance running to keep the
cluster in some semblance of order).
>> Obviously if there was a better way to do this, I'd love to see it.  I see that there
are recommendations of killing the DataNode process and manually moving files, but my concern
is that the DataNode process will spend an enormous amount of time tracking down these moves
(currently around 820,000 blocks/node).  And it's not necessarily easy to automate, so there's
the danger of nuking blocks, and making the problems worse.  Are there alternatives to manual
moves (or more automated ways that exist)?  Or has my brute-force rebalance got the best chance
of success, albeit slowly?
>> We are also building a new cluster - starting around 1.2PB raw, eventually growing
to around 5PB, for near-line storage of data.  Our storage nodes will probably be 4U systems
with 72 data disks each (yeah, good times).  The problem with this becomes obvious - with
the way Hadoop works today, if a disk fails, the datanode process chokes and dies when it
tries to write to it.  We've been told repeatedly that Hadoop doesn't perform well when it
operates on RAID arrays, but, to scale efffectively, we're going to have to do just that -
three 24 disk controllers in RAID-6 mode.  How bad is this going to be?  JBOD just doesn't
scale beyond a couple disks per machine, the failure rate will knock machines out of the cluster
too often (and at 60TB per node, rebalancing will take forever, even if I let it saturate
>> I appreciate opinions and suggestions.  Thanks!
>> -j

View raw message