hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igor Bolotin" <ibolo...@gmail.com>
Subject RE: Question about HDFS allocations
Date Mon, 31 Dec 2007 21:56:40 GMT
There is a configuration property that allows you to reserve some disk space
on datanode servers:

  <description>Reserved space in bytes. Always leave this much space free
for non dfs use</description>


-----Original Message-----
From: Michael Bieniosek [mailto:michael@powerset.com] 
Sent: Monday, December 31, 2007 1:27 PM
To: hadoop-dev@lucene.apache.org; Bryan Duxbury
Subject: Re: Question about HDFS allocations

AFAIK, hdfs doesn't have any notion of balancing data, nor can it do much to
avoid running disks full.  What you describe would certainly be a useful

There is a crude way to force the DFS to rebalance: if a machine gets too
full, you can remove it from the dfs cluster.  The namenode will then
redistribute all the blocks that were on that machine.  Then, you can wipe
your datanode's dfs data and bring it up afresh.


On 12/31/07 11:31 AM, "Bryan Duxbury" <bryan@rapleaf.com> wrote:

We've been doing some testing with HBase, and one of the problems we
ran into was that our machines are not homogenous in terms of disk
capacity. A few of our machines only have 80gb drives, where the rest
have 250s. As such, as the equal distribution of blocks went on,
these smaller machines filled up first, completely overloading the
drives, and came to a crashing halt. Since one of these machines was
also the namenode, it broke the rest of the cluster.

What I'm wondering is if there should be a way to tell HDFS to only
use something like 80% of available disk space before considering a
machine full. Would this be a useful feature, or should we approach
the problem from another angle, like using a separate HDFS data


View raw message