Return-Path: Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: (qmail 74268 invoked from network); 31 Mar 2011 17:19:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 31 Mar 2011 17:19:44 -0000 Received: (qmail 58266 invoked by uid 500); 31 Mar 2011 17:19:44 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 58157 invoked by uid 500); 31 Mar 2011 17:19:44 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 58149 invoked by uid 99); 31 Mar 2011 17:19:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2011 17:19:44 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Mar 2011 17:19:43 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 8B8DE8C510 for ; Thu, 31 Mar 2011 17:19:06 +0000 (UTC) Date: Thu, 31 Mar 2011 17:19:06 +0000 (UTC) From: "Rosie Li (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <335347971.24884.1301591946568.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <26895646.290321294770708037.JavaMail.jira@thor> Subject: [jira] [Updated] (MAPREDUCE-2257) distcp can copy blocks in parallel MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rosie Li updated MAPREDUCE-2257: -------------------------------- Attachment: MAPREDUCE-2257.patch fix the findbug warning > distcp can copy blocks in parallel > ---------------------------------- > > Key: MAPREDUCE-2257 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2257 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: distcp > Affects Versions: 0.21.0 > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Attachments: MAPREDUCE-2257.patch > > > The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira