Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 38728200C1C for ; Wed, 15 Feb 2017 11:53:49 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 3709E160B74; Wed, 15 Feb 2017 10:53:49 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 836A9160B46 for ; Wed, 15 Feb 2017 11:53:48 +0100 (CET) Received: (qmail 37533 invoked by uid 500); 15 Feb 2017 10:53:47 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 37522 invoked by uid 99); 15 Feb 2017 10:53:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Feb 2017 10:53:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 15A821A07CA for ; Wed, 15 Feb 2017 10:53:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.999 X-Spam-Level: X-Spam-Status: No, score=-1.999 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id VUwVAIt4ZnRv for ; Wed, 15 Feb 2017 10:53:46 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 1EAE85FD84 for ; Wed, 15 Feb 2017 10:53:46 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 66016E0655 for ; Wed, 15 Feb 2017 10:53:44 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 9617E24146 for ; Wed, 15 Feb 2017 10:53:42 +0000 (UTC) Date: Wed, 15 Feb 2017 10:53:42 +0000 (UTC) From: "Zheng Shao (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 15 Feb 2017 10:53:49 -0000 [ https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HADOOP-13975: -------------------------------- Attachment: HADOOP-distcp-multithreaded-mapper-branch26.6.patch > Allow DistCp to use MultiThreadedMapper > --------------------------------------- > > Key: HADOOP-13975 > URL: https://issues.apache.org/jira/browse/HADOOP-13975 > Project: Hadoop Common > Issue Type: New Feature > Components: tools/distcp > Affects Versions: 3.0.0-alpha1 > Reporter: Zheng Shao > Assignee: Zheng Shao > Priority: Minor > Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, HADOOP-distcp-multithreaded-mapper-branch26.2.patch, HADOOP-distcp-multithreaded-mapper-branch26.3.patch, HADOOP-distcp-multithreaded-mapper-branch26.4.patch, HADOOP-distcp-multithreaded-mapper-branch26.5.patch, HADOOP-distcp-multithreaded-mapper-branch26.6.patch, HADOOP-distcp-multithreaded-mapper-trunk.1.patch, HADOOP-distcp-multithreaded-mapper-trunk.2.patch, HADOOP-distcp-multithreaded-mapper-trunk.3.patch, HADOOP-distcp-multithreaded-mapper-trunk.4.patch, HADOOP-distcp-multithreaded-mapper-trunk.5.patch > > > Although distcp allow users to control the parallelism via number of mappers, sometimes it's desirable to run fewer mappers but more threads per mapper. Since distcp is network bound (either by throughput or more frequently by latency of creating connections, opening files, reading/writing files, and closing files), this can make each mapper much more efficient. When WebHDFS protocol is used either as source or target, this MultiThreaded approach can make the HTTP connection reuse (to the NameNode) more efficient as well. > In that way, a lot of resources can be shared so we can save memory and connections to NameNode. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org