Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 51409200C3A for ; Fri, 31 Mar 2017 18:07:49 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 4FB79160B80; Fri, 31 Mar 2017 16:07:49 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9568F160B79 for ; Fri, 31 Mar 2017 18:07:48 +0200 (CEST) Received: (qmail 6862 invoked by uid 500); 31 Mar 2017 16:07:47 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 6849 invoked by uid 99); 31 Mar 2017 16:07:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 31 Mar 2017 16:07:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 28F05CA7CF for ; Fri, 31 Mar 2017 16:07:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id cn-zCj-Hcn3n for ; Fri, 31 Mar 2017 16:07:46 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 9B2BF5FCAC for ; Fri, 31 Mar 2017 16:07:44 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 2FEFEE0BDD for ; Fri, 31 Mar 2017 16:07:44 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 627CF21E04 for ; Fri, 31 Mar 2017 16:07:42 +0000 (UTC) Date: Fri, 31 Mar 2017 16:07:42 +0000 (UTC) From: "Yongjun Zhang (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HADOOP-11794) Enable distcp to copy blocks in parallel MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 31 Mar 2017 16:07:49 -0000 [ https://issues.apache.org/jira/browse/HADOOP-11794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951193#comment-15951193 ] Yongjun Zhang commented on HADOOP-11794: ---------------------------------------- Welcome [~omkarksa], I will be working on backporting to other branches asap. Do you have specific expectations? Just to clarify, I did not mean not to consider supporting other file system, rather, I was suggesting working on that as a separate jira. Your help on the testing out ADLS, together with [~steve_l]'s suggestion about checking concat support (UnsupportedException) made it easier for us to relax the file system constraint in this jira. So thank you guys again! BTW, Steve still has an item for you to follow-up here:-) https://issues.apache.org/jira/browse/HADOOP-11794?focusedCommentId=15938217&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15938217 Thanks. > Enable distcp to copy blocks in parallel > ---------------------------------------- > > Key: HADOOP-11794 > URL: https://issues.apache.org/jira/browse/HADOOP-11794 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp > Affects Versions: 0.21.0 > Reporter: dhruba borthakur > Assignee: Yongjun Zhang > Attachments: HADOOP-11794.001.patch, HADOOP-11794.002.patch, HADOOP-11794.003.patch, HADOOP-11794.004.patch, HADOOP-11794.005.patch, HADOOP-11794.006.patch, HADOOP-11794.007.patch, HADOOP-11794.008.patch, HADOOP-11794.009.patch, HADOOP-11794.010.patch, MAPREDUCE-2257.patch > > > The minimum unit of work for a distcp task is a file. We have files that are greater than 1 TB with a block size of 1 GB. If we use distcp to copy these files, the tasks either take a long long long time or finally fails. A better way for distcp would be to copy all the source blocks in parallel, and then stich the blocks back to files at the destination via the HDFS Concat API (HDFS-222) -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org