Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EB73617904 for ; Sat, 18 Oct 2014 05:25:14 +0000 (UTC) Received: (qmail 62153 invoked by uid 500); 18 Oct 2014 05:25:07 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 62017 invoked by uid 500); 18 Oct 2014 05:25:07 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 62007 invoked by uid 99); 18 Oct 2014 05:25:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Oct 2014 05:25:07 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of smani@pivotal.io designates 209.85.218.48 as permitted sender) Received: from [209.85.218.48] (HELO mail-oi0-f48.google.com) (209.85.218.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Oct 2014 05:25:02 +0000 Received: by mail-oi0-f48.google.com with SMTP id g201so1537954oib.35 for ; Fri, 17 Oct 2014 22:24:42 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=QTvx36Ze4TZtztiid9HcQYmg12oUkYqxCXSvkiwoYfg=; b=jZ7CqCsX7HctmvqizUcLbGYkCo1eQxGKbsypcxTm5/I/ROpPBE6WrX66WuAtmM1RQO ivaAxrQwmZvUa/NIoPAXhCVHcLIrKPUbGYZeB0gtHHDPWP0so85VDECJnbGYAWjU4bZ1 SPulbPuiPMHERH/v9ucZJICJBchPAcF7KtzCZnBsZX1gGateRaKSQbkfecOYTj942gXA HY2kkqScJUYVmewRFJEfWNMqfTBZYSGTVbT5w8MtFTQHX8xb6ePQfGd8EnkDo643Bpq8 Xm4H40OcRiQlbo1/LrbREyw26ingzxfB2U91sJ8kz5oPlrj9QApkFZBSW2n7VLWCtHvL NqEg== X-Gm-Message-State: ALoCoQl9O5LsSMih3eiSyZlBbRkEji1ruM4bU1FOThRBP+b9P2QukBn6rn+L5kRe1PnpYXap43MI X-Received: by 10.182.95.9 with SMTP id dg9mr85954obb.44.1413609882185; Fri, 17 Oct 2014 22:24:42 -0700 (PDT) MIME-Version: 1.0 Received: by 10.182.26.110 with HTTP; Fri, 17 Oct 2014 22:24:22 -0700 (PDT) In-Reply-To: References: From: Shivram Mani Date: Fri, 17 Oct 2014 22:24:22 -0700 Message-ID: Subject: Re: how to copy data between two hdfs cluster fastly? To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e01538a12e10d570505abb2dd X-Virus-Checked: Checked by ClamAV on apache.org --089e01538a12e10d570505abb2dd Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Distcp is pretty restrictive w.r.t parallelizing data copy. If all that you are doing is one large file, distcp wouldn't make this any faster. In distcp, files are the lowest level of granularity. So increasing # of maps, may not necessarily increase the overall throughput. The default number of mappers if i=E2=80=99m not wrong is 20 for distcp. If= all you were doing was to copy a large file, only one map task is effectively used On Fri, Oct 17, 2014 at 8:18 PM, ch huang wrote: > yes > > On Sat, Oct 18, 2014 at 3:53 AM, Jakub Stransky > wrote: > >> Distcp? >> On 17 Oct 2014 20:51, "Alexander Pivovarov" wrote= : >> >>> try to run on dest cluster datanode >>> $ hadoop fs -cp hdfs://from_cluster/.... hdfs://to_cluster/.... >>> >>> >>> >>> On Fri, Oct 17, 2014 at 11:26 AM, Shivram Mani wrote= : >>> >>>> What is your approx input size ? >>>> Do you have multiple files or is this one large file ? >>>> What is your block size (source and destination cluster) ? >>>> >>>> On Fri, Oct 17, 2014 at 4:19 AM, ch huang wrote: >>>> >>>>> no ,all default >>>>> >>>>> On Fri, Oct 17, 2014 at 5:46 PM, Azuryy Yu wrote= : >>>>> >>>>>> Did you specified how many map tasks? >>>>>> >>>>>> >>>>>> On Fri, Oct 17, 2014 at 4:58 PM, ch huang >>>>>> wrote: >>>>>> >>>>>>> hi,maillist: >>>>>>> i now use distcp to migrate data from CDH4.4 to CDH5.1 >>>>>>> , i find when copy small file,it very good, but when transfer big d= ata ,it >>>>>>> very slow ,any good method recommand? thanks >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> Thanks >>>> Shivram >>>> >>> >>> > --=20 Thanks Shivram --089e01538a12e10d570505abb2dd Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Distcp is pretty restrictive w.r.t parallelizing data copy. I= f all that you are doing is one large file, distcp wouldn't make this a= ny faster.

In distcp, files are the lowest level of granularity. So incr= easing # of maps, may not necessarily increase the overall throughput.

The default number of mappers if i=E2=80=99m not wrong is 20 = for distcp. If all you were doing was to copy a large file, only one map ta= sk is effectively used


On Fri, Oct 17, 2014 at 8:18 PM, ch huang <justlooks@gm= ail.com> wrote:
yes

On Sat, Oct 18, 2014 at 3:53 AM, Jaku= b Stransky <stransky.ja@gmail.com> wrote:

Distcp?

On 17 Oct 2014 20:51, "Alexander Pivovarov&= quot; <apivova= rov@gmail.com> wrote:
try to run on dest cluster datanode
$ hadoop = fs -cp hdfs://from_cluster/.... =C2=A0 =C2=A0hdfs://to_cluster/....



On Fri, Oct 17, 2014 at 11:26 AM, Shivram Mani <smani@pi= votal.io> wrote:
What is your approx input size ?
Do you have multiple files or= is this one large file ?
What is your block size (source and des= tination cluster) ?

On Fri, Oct 17, 2014 at 4:19 AM, ch huang <justl= ooks@gmail.com> wrote:
no ,all default

=
On Fri, Oct 17, 2014 at 5:46 PM, Azuryy Yu <az= uryyyu@gmail.com> wrote:
Did you specified how many map tasks?


On Fri, Oct 17, 2014 at 4:58 PM, ch huang <justlooks@gmail.com&g= t; wrote:
hi,mail= list:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0i now use distcp = to migrate data from CDH4.4 to CDH5.1 , i find when copy small file,it very= good, but when transfer big data ,it very slow ,any good method recommand?= thanks





<= /div>--
Thanks
Shivram





--
=
Thanks
Shivram
--089e01538a12e10d570505abb2dd--