hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harish Mallipeddi <harish.mallipe...@gmail.com>
Subject Re: What will we encounter if we add a lot of nodes into the current cluster?
Date Thu, 13 Aug 2009 04:14:36 GMT
On Thu, Aug 13, 2009 at 8:06 AM, yang song <hadoop.inifok@gmail.com> wrote:

> Thank you for teaching me that.
> I'm trying to use the balance tool(bin/hadoop balancer -t xxx). However,
> the
> data transfer is so slow that it will take a long long time.
> Is there a good method to solve it?
> What's more, I have a puzzle. The situation is we rarely use the existed
> data in the cluster. That means to rebalance existed data is not that
> worth.
> So, I intend to not rebalance the data. Is my opinion right?
I think after you add new nodes to the cluster and if you don't rebalance,
hadoop is probably going to pick the new nodes over the older ones for all
the new data that you write into HDFS. As a result even your new data is
probably going to be not balanced evenly across the cluster. If you're going
to run m/r jobs on this new data, then it's a good idea to have that spread
across the cluster evenly.

Harish Mallipeddi

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message