Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 435F2FB3F for ; Thu, 28 Mar 2013 03:54:40 +0000 (UTC) Received: (qmail 61550 invoked by uid 500); 28 Mar 2013 03:54:35 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 61260 invoked by uid 500); 28 Mar 2013 03:54:35 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 61230 invoked by uid 99); 28 Mar 2013 03:54:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2013 03:54:34 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of himanish@gmail.com designates 209.85.215.45 as permitted sender) Received: from [209.85.215.45] (HELO mail-la0-f45.google.com) (209.85.215.45) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Mar 2013 03:54:29 +0000 Received: by mail-la0-f45.google.com with SMTP id er20so16753042lab.4 for ; Wed, 27 Mar 2013 20:54:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=zQUc3BAmmQqZuHdG5qXX6zazeTZjXqDZHp6/DMwXwhI=; b=ZehhjLGedBKGXqJr62qB6RvWBhpU8bKbFcXcdTaKuJlLEZIuqc0sMm0rNeUXdbRnY/ DWe435P15WWocGqaHjNHnau4eTkbbTlcOaD+Cnp+2pdWfaD2E/d+znraFhssCmGZz3Cm Z4SxkcLXWSu8H0lVyJGEOjkYADtwDWfsyvK4OC1gCo1fPbHfTL6zX7xfSW3QKSR0pCh9 fOQxR2aI8MPqpkvDInO8b4s6QVDNxxpSvecMeiyJ3s0ANPW4vb/2mxWkTOW4HlKn241A IMLkxLLgLrMkt7flhuzDjnDTu/S/N1W5oY9pJ4GAAl71vsZoE8Kcc2OZX4/g7xN3DmGI qWoQ== MIME-Version: 1.0 X-Received: by 10.112.180.105 with SMTP id dn9mr2470583lbc.10.1364442847601; Wed, 27 Mar 2013 20:54:07 -0700 (PDT) Received: by 10.112.131.132 with HTTP; Wed, 27 Mar 2013 20:54:07 -0700 (PDT) Date: Wed, 27 Mar 2013 23:54:07 -0400 Message-ID: Subject: Hadoop distcp from CDH4 to Amazon S3 - Improve Throughput From: Himanish Kushary To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e01183ba43f6c4104d8f41bce X-Virus-Checked: Checked by ClamAV on apache.org --089e01183ba43f6c4104d8f41bce Content-Type: text/plain; charset=ISO-8859-1 Hello, I am trying to transfer around 70 GB of files from HDFS to Amazon S3 using the distcp utility.There are aaround 2200 files distributed over 15 directories.The max individual file size is approx 50 MB. The distcp mapreduce job keeps on failing with this error "Task attempt_201303211242_0260_m_000005_0 failed to report status for 600 seconds. Killing!" and in the task attempt logs I can see lot of INFO messages like "INFO org.apache.commons.httpclient.HttpMethodDirector: I/O exception (java.io.IOException) caught when processing request: Resetting to invalid mark" I am thinking either transferring individual folders instead of the entire 70 GB folders as a workaround or as another option increasing the " mapred.task.timeout" parameter to something like 6-7 hour ( as the avg rate of transfer to S3 seems to be 5 MB/s).Is there any other better option to increase the throughput for transferring bulk data from HDFS to S3 ? Looking forward for suggestions. -- Thanks & Regards Himanish --089e01183ba43f6c4104d8f41bce Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hello,

I am trying to transfer around 70 GB of files from HDFS to Am= azon S3 using the distcp utility.There are aaround 2200 files distributed o= ver 15 directories.The max individual file size is approx 50 MB.

The distcp mapreduce job keeps on failing with this error=A0

<= /div>
"Task attempt_201303211242_0260_m_00000= 5_0 failed to report status for 600 seconds. Killing!" =A0

and in the task attempt logs I can see lot of INFO messages like=A0=

"INFO org.apache.commons.httpcli= ent.HttpMethodDirector: I/O exception (java.io.IOException) caught when pro= cessing request: Resetting to invalid mark"
=A0
I am thinking either tra= nsferring individual folders instead of the entire 70 GB folders as a worka= round or as another option increasing the=A0"mapred.task.timeo= ut"=A0parameter to something like 6-7 hour ( as the avg rate = of transfer to S3 seems to be 5 MB/s).Is there any other = better option to increase the throughput for transferring bulk data from HD= FS to S3 ? =A0Looking forward for suggestions.


--
Thanks & Regards
Himanish --089e01183ba43f6c4104d8f41bce--