Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D63F9FD80 for ; Thu, 28 Mar 2013 07:46:09 +0000 (UTC) Received: (qmail 43739 invoked by uid 500); 28 Mar 2013 07:46:03 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 42516 invoked by uid 500); 28 Mar 2013 07:46:01 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 42494 invoked by uid 99); 28 Mar 2013 07:46:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2013 07:46:00 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.215.47] (HELO mail-la0-f47.google.com) (209.85.215.47) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2013 07:45:53 +0000 Received: by mail-la0-f47.google.com with SMTP id fj20so17238420lab.34 for ; Thu, 28 Mar 2013 00:45:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:x-gm-message-state; bh=vY9RAEcwQyVpquarnbdJnlfd2bDwSDvAI4IH+wgdUrg=; b=TK66H+wSXAv3bo9/QGTJKKRL9Ub7Dto7dLHg1OYAcHkgsZ7PEUjNVNvzQgUCYYPqOe UR7SPZsQbnL7E8q3/xGpD3wkQiNzD1EqAUlsg2JpFcxLhtFnuTtfeu1xQGratcYCYP0T K146E3Mrbn6sWlePuqEwwkuB32wqs8qu8X5ol0YsVLxHoGbTFnHqBIzxYhHU2xPzgJ9c 9Yuz7K6ADkrR63uG0dFqnWqcSIRGAGR6sVxVeMI+/B0bdU2p5GC4/lLgvCuV+9wTT6zy 9IgAsvDqphhxBgeziPMD9oljqr4RYUkuMqLDunk/quYuj6wHGTQ5o8nTlKSkmfz8aaDo djhw== X-Received: by 10.112.47.41 with SMTP id a9mr11543121lbn.134.1364456732540; Thu, 28 Mar 2013 00:45:32 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.37.5 with HTTP; Thu, 28 Mar 2013 00:45:12 -0700 (PDT) In-Reply-To: References: From: Ted Dunning Date: Thu, 28 Mar 2013 08:45:12 +0100 Message-ID: Subject: Re: Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput To: "common-user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=bcaec554d754dbdcb204d8f7565f X-Gm-Message-State: ALoCoQk6nl7KZ0BGlMra+yV0DcsLPC9JncjMR9JXuBu9STgM0h83MSVhJairTFaFN10bvWJpHyml X-Virus-Checked: Checked by ClamAV on apache.org --bcaec554d754dbdcb204d8f7565f Content-Type: text/plain; charset=ISO-8859-1 The EMR distributions have special versions of the s3 file system. They might be helpful here. Of course, you likely aren't running those if you are seeing 5MB/s. An extreme alternative would be to light up an EMR cluster, copy to it, then to S3. On Thu, Mar 28, 2013 at 4:54 AM, Himanish Kushary wrote: > I am thinking either transferring individual folders instead of the entire > 70 GB folders as a workaround or as another option increasing the " > mapred.task.timeout" parameter to something like 6-7 hour ( as the avg > rate of transfer to S3 seems to be 5 MB/s).Is there any other better > option to increase the throughput for transferring bulk data from HDFS to > S3 ? Looking forward for suggestions. > --bcaec554d754dbdcb204d8f7565f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
The EMR distributions hav= e special versions of the s3 file system. =A0They might be helpful here.

Of course, you likely aren't running those if you are seeing 5MB/s.=

An extreme alternative would be to light up an EMR cluster, copy to it, th= en to S3.


On Thu, Mar 28, 2013 at 4:54 AM, Himanish Ku= shary <himanish@gmail.com> wrote:
I am thinking either transferring individual folders instead of the ent= ire 70 GB folders as a workaround or as another option increasing the=A0&qu= ot;mapred.task.timeout"=A0parameter to something like= 6-7 hour ( as the avg rate of transfer to S3 seems to be 5 MB/s).Is there any other better option to increase the throughput for tr= ansferring bulk data from HDFS to S3 ? =A0Looking forward for suggestions.<= /div>


--bcaec554d754dbdcb204d8f7565f--