Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B582F184C0 for ; Sun, 17 Apr 2016 10:18:21 +0000 (UTC) Received: (qmail 6977 invoked by uid 500); 17 Apr 2016 10:18:15 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 6862 invoked by uid 500); 17 Apr 2016 10:18:15 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 6054 invoked by uid 99); 17 Apr 2016 10:18:14 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 17 Apr 2016 10:18:14 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 6AA7FC1805 for ; Sun, 17 Apr 2016 10:18:14 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.004 X-Spam-Level: ** X-Spam-Status: No, score=2.004 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=2, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-0.996] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id SudYf12Q6jdr for ; Sun, 17 Apr 2016 10:18:13 +0000 (UTC) Received: from panel.eagleeyet.net (panel.eagleeyet.net [85.90.245.223]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 99B075F486 for ; Sun, 17 Apr 2016 10:18:12 +0000 (UTC) Received: from webmail.eagleeyet.net (panel.eagleeyet.net [IPv6:::1]) by panel.eagleeyet.net (Postfix) with ESMTPA id 781D81CE9; Sun, 17 Apr 2016 12:18:11 +0200 (CEST) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="=_d30c01c6008f5f112a95a6cb03bc0abd" Date: Sun, 17 Apr 2016 12:18:11 +0200 From: Jonathan Aquilina To: raymond Cc: user@hadoop.apache.org Subject: Re: Best way to migrate PB scale data between live cluster? In-Reply-To: References: Message-ID: X-Sender: jaquilina@eagleeyet.net User-Agent: Roundcube Webmail/1.1.4 --=_d30c01c6008f5f112a95a6cb03bc0abd Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII Probably a stupid suggestion but did you guys consider rsync? Supposed to be quick and can do deletes? On 2016-04-12 11:44, raymond wrote: > Hi > > We have a hadoop cluster with several PB data. and we need to migrate it to a new cluster across datacenter for larger volume capability. > We estimate that the data copy itself might took near a month to finish. So we are seeking for a sound solution. The requirement is as below: > 1. we cannot bring down the old cluster for such a long time ( of course), and a couple of hours is acceptable. > 2. we need to mirror the data, it means that we not only need to copy the new data, but also need to delete the deleted data happened during the migration period. > 3. we don't have much space left on the old cluster, say 30% room. > > regarding distcp, although it might be the easiest way , but > > 1. it do not handle data delete > 2. it handle newly appended file by compare file size and overwrite it ( well , it might waste a lot of bandwidth ) > 3. error handling base on file is triffle. > 4 load control is difficult ( we still have heavy work load on old cluster) you can just try to split your work manually and make it small enough to achieve the flow control goal. > > In one word, for a long time mirror work. It won't do well by itself. > > The are some possible works might need to be done : > > We can: > > * Do some wrap work around distcp to make it works better. ( say error handling, check results. Extra code for sync deleted files etc. ) > * Utilize Snapshot mechanisms for better identify files need to be copied and deleted. Or renamed. > > Or > > * Forget about distcp. Use FSIMAGE and editlog as a change history source, and write our own code to replay the operation. Handle each file one by one. ( better per file error handling could be achieved), but this might need a lot of dev works. > > Btw. The closest thing I could found is facebook migration 30PB hive warehouse: > > https://www.facebook.com/notes/facebook-engineering/moving-an-elephant-large-scale-hadoop-data-migration-at-facebook/10150246275318920/ > > They modifiy the distcp to do a initial bulk load (to better handling large files and very small files, for load balance I guess.) , and a replication system (not much detail on this part) to mirror the changes. > > But it is not clear that how they handle those shortcomings of distcp I mentioned above. And do they utilize snapshot mechanism. > > So , does anyone have experience on this kind of work? What do you think might be the best approaching for our case? Is there any ready works been done that we can utilize? Is there any works have been done around snapshot mechanism to easy data migration? --=_d30c01c6008f5f112a95a6cb03bc0abd Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8

Probably a stupid suggestion but did you guys consider rsync? Supposed t= o be quick and can do deletes?

 

 

On 2016-04-12 11:44, raymond wrote:

Hi

 

We have a hadoop cl= uster with several PB data. and we need to migrate it to a new cluster acro= ss datacenter for larger volume capability.
We estimate that th= e data copy itself might took near a month to finish. So we are seeking for= a sound solution. The requirement is as below:
1. we cannot bring = down the old cluster for such a long time ( of course), and a couple of hou= rs is acceptable.
2. we need to mirro= r the data, it means that we not only need to copy the new data, but also n= eed to delete the deleted data happened during the migration period.
3. we don't have mu= ch space left on the old cluster, say 30% room.

 =

regarding distcp, a= lthough it might be the easiest way , but 

 

1. it do not handle= data delete
2. it handle newly = appended file by compare file size and overwrite it ( well , it might waste= a lot of bandwidth )
3. error handling b= ase on file is triffle. 
4 load control is d= ifficult ( we still have heavy work load on old cluster) you can just try t= o split your work manually and make it small enough to achieve the flow con= trol goal.

 

In one word, for a = long time mirror work. It won't do well by itself.

 

The are some possib= le works might need to be done :

 

We can:

 

  1. Do  som= e wrap work around distcp to make it works better. ( say error handling, ch= eck results. Extra code for sync deleted files etc. )
  2. Utilize Snapshot mechani= sms for better identify files need to be copied and deleted. Or renamed.

 =

Or

 =

  1. Forget about distcp. Use FSIMAGE and editlog as a change history source,= and write our own code to replay the operation. Handle each file one by on= e. ( better per file error handling could be achieved), but this might need= a lot of dev works.

 =

 =

Btw. The closest th= ing I could found is facebook migration 30PB hive warehouse:

 =

 =

They modifiy the di= stcp to do a initial bulk load (to better handling large files and very sma= ll files, for load balance I guess.) , and a replication system (not much d= etail on this part) to mirror the changes.

 =

But it is not clear= that how they handle those shortcomings of distcp I mentioned above. And d= o they utilize snapshot mechanism.

 =

So , does anyone ha= ve experience on this kind of work? What do you think might be the best app= roaching for our case? Is there any ready works been done that we can utili= ze? Is there any works have been done around snapshot mechanism to easy dat= a migration?
--=_d30c01c6008f5f112a95a6cb03bc0abd--