Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1B63A19B70 for ; Mon, 18 Apr 2016 07:34:12 +0000 (UTC) Received: (qmail 51844 invoked by uid 500); 18 Apr 2016 07:34:07 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 51721 invoked by uid 500); 18 Apr 2016 07:34:07 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 51710 invoked by uid 99); 18 Apr 2016 07:34:07 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Apr 2016 07:34:07 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id D4A8F180224 for ; Mon, 18 Apr 2016 07:34:06 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.179 X-Spam-Level: * X-Spam-Status: No, score=1.179 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id y2Pa3kUZtz3S for ; Mon, 18 Apr 2016 07:34:04 +0000 (UTC) Received: from mail-lf0-f43.google.com (mail-lf0-f43.google.com [209.85.215.43]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id EFEAE5FB3D for ; Mon, 18 Apr 2016 07:34:03 +0000 (UTC) Received: by mail-lf0-f43.google.com with SMTP id e190so206531840lfe.0 for ; Mon, 18 Apr 2016 00:34:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=37YX8t6Utl6OLKgB8Nu3GJsvd42OSqcanLhpiuDhmrg=; b=DG619inE6+juubRCe6STzzHpLM+Hlv2oWvksMnRIRj7LvDTJGUcf3Z6A7jwEluOpSb axZ6OmaBLyov/loQk9TKZwjKzjDCAlFD7xFOCxTwdok9bCHKX3Vs36gVZ4bD2qRRjqAu 4v30HHH4q+Pb2WgPqqYNqoBWNYsj84FPgY/nwKdUtZZIGHlvTJxBhwaoBr1wZX81EBZB 7/SaejqMMEqiBJNAVvmsXi+1NvsPg8vHgGbjSSWQ1l6W6Di9mFbLCGEC3q2wld5Ygv+O qDW/Q6hiM7wU84xINphRBo8EPfpn9y4Eqx6pRD237LjKimlPuthQM7A5ub9fT9+KCOf8 knmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=37YX8t6Utl6OLKgB8Nu3GJsvd42OSqcanLhpiuDhmrg=; b=VNYzMRlxTiybF6B+q69P50lNNuImmBs24DjRgRXryjd/SvktZLucEpegwdEJ7KJGLY rWNuS+Y+1yybxP2Fi0y7huej96RUwsM4EPWrI26ocrEA52E8oFtn+4My7QqtotzYXVN/ DrnvdFSWWcajwaJ5fzs4mLTbkjevMLmc5tJfMr058/ZNsHq2iDcGRTGsaSUXTqjhOz6y 8cRB4DjtZ60GP0iYWzBjEzp+v48XhtfSUeCJKioEGSAkfY8MJHxJ951FeSV/x4kNLYeL MFoGKll5Anz7EakEEjkguzl+uMUHaxfxfGZ2HTWHdhI75JDDjbFzx4XWaunI/vcrKBbA todg== X-Gm-Message-State: AOPr4FUbp6eRxwqSDXhTA7Ykz1FXFoN5DBqVVgGp2uofrw/Z6d7ZZgER6Q18hcS1AeNTjz0Po7ja85StwCNgAA== MIME-Version: 1.0 X-Received: by 10.112.141.197 with SMTP id rq5mr13271980lbb.5.1460964842531; Mon, 18 Apr 2016 00:34:02 -0700 (PDT) Received: by 10.112.25.33 with HTTP; Mon, 18 Apr 2016 00:34:02 -0700 (PDT) In-Reply-To: References: Date: Mon, 18 Apr 2016 08:34:02 +0100 Message-ID: Subject: Re: Best way to migrate PB scale data between live cluster? From: cs user To: Jonathan Aquilina Cc: raymond , user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c25b547805420530bd62be --001a11c25b547805420530bd62be Content-Type: text/plain; charset=UTF-8 rsync is fairly low level, I guess it would be ok as a last resort to get back files held within hadoop. But it might be difficult to reconstruct a hadoop cluster using just the raw files on the disk. It wouldn't be very quick in any case. How are people doing disaster recover then with large hadoop clusters? Lets say you have two data centers and you want to replicate data from one cluster to another, so that if you lost your primary dc, you could then switch to the secondary one? If you take a look here - http://hortonworks.com/partner/wandisco/ There is a paid for solution using wan disco which is able to perform this replication for you. Are there no other alternatives to this? On Sun, Apr 17, 2016 at 11:18 AM, Jonathan Aquilina wrote: > Probably a stupid suggestion but did you guys consider rsync? Supposed to > be quick and can do deletes? > > > > > On 2016-04-12 11:44, raymond wrote: > > Hi > > > We have a hadoop cluster with several PB data. and we need to migrate it > to a new cluster across datacenter for larger volume capability. > We estimate that the data copy itself might took near a month to finish. > So we are seeking for a sound solution. The requirement is as below: > 1. we cannot bring down the old cluster for such a long time ( of course), > and a couple of hours is acceptable. > 2. we need to mirror the data, it means that we not only need to copy the > new data, but also need to delete the deleted data happened during the > migration period. > 3. we don't have much space left on the old cluster, say 30% room. > > > regarding distcp, although it might be the easiest way , but > > > 1. it do not handle data delete > 2. it handle newly appended file by compare file size and overwrite it ( > well , it might waste a lot of bandwidth ) > 3. error handling base on file is triffle. > 4 load control is difficult ( we still have heavy work load on old > cluster) you can just try to split your work manually and make it small > enough to achieve the flow control goal. > > > In one word, for a long time mirror work. It won't do well by itself. > > > The are some possible works might need to be done : > > > We can: > > > > 1. Do some wrap work around distcp to make it works better. ( say > error handling, check results. Extra code for sync deleted files etc. ) > 2. Utilize Snapshot mechanisms for better identify files need to be > copied and deleted. Or renamed. > > > Or > > > > 1. Forget about distcp. Use FSIMAGE and editlog as a change history > source, and write our own code to replay the operation. Handle each file > one by one. ( better per file error handling could be achieved), but this > might need a lot of dev works. > > > > > Btw. The closest thing I could found is facebook migration 30PB hive > warehouse: > > > > https://www.facebook.com/notes/facebook-engineering/moving-an-elephant-large-scale-hadoop-data-migration-at-facebook/10150246275318920/ > > > They modifiy the distcp to do a initial bulk load (to better handling > large files and very small files, for load balance I guess.) , and a > replication system (not much detail on this part) to mirror the changes. > > > But it is not clear that how they handle those shortcomings of distcp I > mentioned above. And do they utilize snapshot mechanism. > > > So , does anyone have experience on this kind of work? What do you think > might be the best approaching for our case? Is there any ready works been > done that we can utilize? Is there any works have been done around snapshot > mechanism to easy data migration? > > --001a11c25b547805420530bd62be Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
rsync is fairly low level, I guess it would be ok as a las= t resort to get back files held within hadoop. But it might be difficult to= reconstruct a hadoop cluster using just the raw files on the disk. It woul= dn't be very quick in any case.

How are people doing= disaster recover then with large hadoop clusters? Lets say you have two da= ta centers and you want to replicate data from one cluster to another, so t= hat if you lost your primary dc, you could then switch to the secondary one= ?

If you take a look here -=C2=A0http://hortonworks.com/partner/wandisco= /

There is a paid for solution using wan disco= which is able to perform this replication for you. Are there no other alte= rnatives to this?
=C2=A0
On Sun, Apr 17, 2016 at 11:18 AM, Jonathan Aqu= ilina <jaquilina@eagleeyet.net> wrote:

Probably a stupid suggestion but did you guys consider rsync? Supposed t= o be quick and can do deletes?

=C2=A0

=C2=A0

On 2016-04-12 11:44, raymond wrote:

Hi

=C2=A0

We have a hadoop cluster with seve= ral PB data. and we need to migrate it to a new cluster across datacenter f= or larger volume capability.
We estimate that the data copy its= elf might took near a month to finish. So we are seeking for a sound soluti= on. The requirement is as below:
1. we cannot bring down the old cl= uster for such a long time ( of course), and a couple of hours is acceptabl= e.
2. we need to mirror the data, it = means that we not only need to copy the new data, but also need to delete t= he deleted data happened during the migration period.
3. we don't have much space le= ft on the old cluster, say 30% room.

=C2=A0

regarding distcp, although it migh= t be the easiest way , but=C2=A0

=C2=A0

1. it do not handle data delete
2. it handle newly appended file b= y compare file size and overwrite it ( well , it might waste a lot of bandw= idth )
3. error handling base on file is = triffle.=C2=A0
4 load control is difficult ( we s= till have heavy work load on old cluster) you can just try to split your wo= rk manually and make it small enough to achieve the flow control goal.

=C2=A0

In one word, for a long time mirro= r work. It won't do well by itself.

=C2=A0

The are some possible works might = need to be done :

=C2=A0

We can:

=C2=A0

  1. Do=C2=A0 some wrap work around distcp to m= ake it works better. ( say error handling, check results. Extra code for sy= nc deleted files etc. )
  2. Utilize Snapshot mechanisms for better identify files = need to be copied and deleted. Or renamed.

=C2=A0

Or

=C2=A0

  1. Forget about distcp. Use FS= IMAGE and editlog as a change history source, and write our own code to rep= lay the operation. Handle each file one by one. ( better per file error han= dling could be achieved), but this might need a lot of dev works.

=C2=A0

=C2=A0

Btw. The closest thing I could fou= nd is facebook migration 30PB hive warehouse:

=C2=A0

=C2=A0

They modifiy the distcp to do a in= itial bulk load (to better handling large files and very small files, for l= oad balance I guess.) , and a replication system (not much detail on this p= art) to mirror the changes.

=C2=A0

But it is not clear that how they = handle those shortcomings of distcp I mentioned above. And do they utilize = snapshot mechanism.

=C2=A0

So , does anyone have experience o= n this kind of work? What do you think might be the best approaching for ou= r case? Is there any ready works been done that we can utilize? Is there an= y works have been done around snapshot mechanism to easy data migration?

--001a11c25b547805420530bd62be--