hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer <awittena...@linkedin.com>
Subject Re: Changing hostnames of tasktracker/datanode nodes - any problems?
Date Tue, 10 Aug 2010 18:01:47 GMT

On Aug 10, 2010, at 10:54 AM, Bill Graham wrote:
> Is is correct to say that that would work fine? We have a replication factor
> of 2, so we'd be copying twice as much data as we'd need to so I'm sure
> there's a more efficient approach.

It should work fine.  But yes, highly inefficient.

> What about adding the new nodes in the new colo to the existing cluster,
> rebalancing and then decommissioning the old cluster nodes before finally
> migrating the NN/SNN? I know Hadoop isn't intended to run cross-colo, but
> would this be a more efficient approach than the one above?

If you can keep both grids up at the same time, use distcp to do the copy.  This will make
sure the blocks get copied once, will keep permissions with -p, keep the replication factor,
redistribute data (free balancing!), etc, etc, etc.

View raw message