hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stu24m...@yahoo.com
Subject Re: keeping an active hdfs cluster balanced
Date Thu, 17 Mar 2011 22:09:16 GMT
Thanks Allen!

This all makes sense. 
I'm already looking into expiring data - and good suggestion with the logs. I could do some
things more efficiently data - but I'm not sure if I have any big wins I can pull off.

I'm in the midst of a OS upgrade & hope to switch from Apache to CDH as well. Hopefully
I can clean some stuff up in the process.

It does sound like I'm just going to have to find some hardware somewhere..

Take care,

-----Original Message-----
From: Allen Wittenauer <aw@apache.org>
Date: Thu, 17 Mar 2011 14:20:06 
To: <hdfs-user@hadoop.apache.org>
Reply-To: hdfs-user@hadoop.apache.org
Subject: Re: keeping an active hdfs cluster balanced

On Mar 17, 2011, at 12:13 PM, Stuart Smith wrote:

> Parts of this may end up on the hbase list, but I thought I'd start here. My basic problem
> My cluster is getting full enough that having one data node go down does put a bit of
pressure on the system (when balanced, every DN is more than half full).

	Usually around the ~80% full mark is when HDFS starts getting a bit wonky on super active
grids. Your best bet is to either delete some data/store the data more efficiently, add more
nodes, or upgrade the storage capacity of the nodes you have.  The balancer is only going
to save you for so long until the whole thing tips over.

> Anybody here have any idea how badly running the balancer on a heavily active system
messes things up? (for hdfs/hbase - if anyone knows).

	I don't run HBase, but at Y! we used to run the balancer pretty much every day, even on super
active grids.  It 'mostly works' until you get to the point of no return, which it sounds
like you are heading for...

> Any ideas? Or do I just need better hardware? Not sure if that's an option, though..

	Depending upon how your systems are configured, something else to look at is how much space
is getting ate by logs, mapreduce spill space, etc.  A good daemon bounce might free up some
stale handles as well.
View raw message