hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer ...@apache.org>
Subject Re: keeping an active hdfs cluster balanced
Date Thu, 17 Mar 2011 21:20:06 GMT

On Mar 17, 2011, at 12:13 PM, Stuart Smith wrote:

> Parts of this may end up on the hbase list, but I thought I'd start here. My basic problem
> My cluster is getting full enough that having one data node go down does put a bit of
pressure on the system (when balanced, every DN is more than half full).

	Usually around the ~80% full mark is when HDFS starts getting a bit wonky on super active
grids. Your best bet is to either delete some data/store the data more efficiently, add more
nodes, or upgrade the storage capacity of the nodes you have.  The balancer is only going
to save you for so long until the whole thing tips over.

> Anybody here have any idea how badly running the balancer on a heavily active system
messes things up? (for hdfs/hbase - if anyone knows).

	I don't run HBase, but at Y! we used to run the balancer pretty much every day, even on super
active grids.  It 'mostly works' until you get to the point of no return, which it sounds
like you are heading for...

> Any ideas? Or do I just need better hardware? Not sure if that's an option, though..

	Depending upon how your systems are configured, something else to look at is how much space
is getting ate by logs, mapreduce spill space, etc.  A good daemon bounce might free up some
stale handles as well.
View raw message