hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Noguchi <knogu...@yahoo-inc.com>
Subject Re: keeping an active hdfs cluster balanced
Date Thu, 24 Mar 2011 18:12:00 GMT
Just a note 

> Usually around the ~80% full mark is when HDFS starts getting a bit wonky
These days, we have large grids over 90% full and still running fine.
Percentage of hdfs space could be misleading.  We usually monitor the
percentage of full datanodes.


On 3/17/11 2:20 PM, "Allen Wittenauer" <aw@apache.org> wrote:

> On Mar 17, 2011, at 12:13 PM, Stuart Smith wrote:
>> Parts of this may end up on the hbase list, but I thought I'd start here. My
>> basic problem is:
>> My cluster is getting full enough that having one data node go down does put
>> a bit of pressure on the system (when balanced, every DN is more than half
>> full).
> Usually around the ~80% full mark is when HDFS starts getting a bit wonky on
> super active grids. Your best bet is to either delete some data/store the data
> more efficiently, add more nodes, or upgrade the storage capacity of the nodes
> you have.  The balancer is only going to save you for so long until the whole
> thing tips over.
>> Anybody here have any idea how badly running the balancer on a heavily active
>> system messes things up? (for hdfs/hbase - if anyone knows).
> I don't run HBase, but at Y! we used to run the balancer pretty much every
> day, even on super active grids.  It 'mostly works' until you get to the point
> of no return, which it sounds like you are heading for...
>> Any ideas? Or do I just need better hardware? Not sure if that's an option,
>> though..
> Depending upon how your systems are configured, something else to look at is
> how much space is getting ate by logs, mapreduce spill space, etc.  A good
> daemon bounce might free up some stale handles as well.

View raw message