Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EE2532009EE for ; Wed, 18 May 2016 08:44:14 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id ECBD9160A15; Wed, 18 May 2016 06:44:14 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 65C211609B1 for ; Wed, 18 May 2016 08:44:14 +0200 (CEST) Received: (qmail 36139 invoked by uid 500); 18 May 2016 06:44:13 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 36119 invoked by uid 99); 18 May 2016 06:44:13 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2016 06:44:13 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id E43E12C1F5C for ; Wed, 18 May 2016 06:44:12 +0000 (UTC) Date: Wed, 18 May 2016 06:44:12 +0000 (UTC) From: "Rajesh Balamohan (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HADOOP-13169) Randomize file list in SimpleCopyListing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 18 May 2016 06:44:15 -0000 [ https://issues.apache.org/jira/browse/HADOOP-13169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated HADOOP-13169: -------------------------------------- Attachment: HADOOP-13169-branch-2-002.patch Removed TestOptionsParser change which was already available in branch-2. > Randomize file list in SimpleCopyListing > ---------------------------------------- > > Key: HADOOP-13169 > URL: https://issues.apache.org/jira/browse/HADOOP-13169 > Project: Hadoop Common > Issue Type: Improvement > Components: tools/distcp > Reporter: Rajesh Balamohan > Priority: Minor > Attachments: HADOOP-13169-branch-2-001.patch, HADOOP-13169-branch-2-002.patch > > > When copying files to S3, based on file listing some mappers can get into S3 partition hotspots. This would be more visible, when data is copied from hive warehouse with lots of partitions (e.g date partitions). In such cases, some of the tasks would tend to be a lot more slower than others. It would be good to randomize the file paths which are written out in SimpleCopyListing to avoid this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org