hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject Re: Changing hostnames of tasktracker/datanode nodes - any problems?
Date Tue, 10 Aug 2010 18:03:38 GMT
Ahh yes of course, distcp. Thanks!

On Tue, Aug 10, 2010 at 11:01 AM, Allen Wittenauer <awittenauer@linkedin.com
> wrote:

>
> On Aug 10, 2010, at 10:54 AM, Bill Graham wrote:
> > Is is correct to say that that would work fine? We have a replication
> factor
> > of 2, so we'd be copying twice as much data as we'd need to so I'm sure
> > there's a more efficient approach.
>
> It should work fine.  But yes, highly inefficient.
>
> > What about adding the new nodes in the new colo to the existing cluster,
> > rebalancing and then decommissioning the old cluster nodes before finally
> > migrating the NN/SNN? I know Hadoop isn't intended to run cross-colo, but
> > would this be a more efficient approach than the one above?
>
> If you can keep both grids up at the same time, use distcp to do the copy.
>  This will make sure the blocks get copied once, will keep permissions with
> -p, keep the replication factor, redistribute data (free balancing!), etc,
> etc, etc.
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message