Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CFE41F785 for ; Wed, 20 Mar 2013 17:27:19 +0000 (UTC) Received: (qmail 67765 invoked by uid 500); 20 Mar 2013 17:27:19 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 67402 invoked by uid 500); 20 Mar 2013 17:27:19 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 67000 invoked by uid 99); 20 Mar 2013 17:27:18 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Mar 2013 17:27:18 +0000 Date: Wed, 20 Mar 2013 17:27:18 +0000 (UTC) From: "ledion bitincka (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (MAPREDUCE-5085) JobClient reorders splits MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ledion bitincka updated MAPREDUCE-5085: --------------------------------------- Description: The JobClient hard codes ordering of splits in descending size. While this could be fine for traditional/batch mr jobs it is not well suited for map-only jobs where a client is interested in the order of map executions. More over, by constantly running more expensive mappers early in the job the cluster is taxed more heavily and not uniformly/smoothly utilized over time. {code} ...JobClient.java private int writeNewSplits(JobContext job, Path jobSubmitDir) throws IOException, InterruptedException, ClassNotFoundException { .... // sort the splits into order based on size, so that the biggest // go first Arrays.sort(array, new SplitComparator()); JobSplitWriter.createSplitFiles(jobSubmitDir, conf, jobSubmitDir.getFileSystem(conf), array); return array.length; } {code} It should be straightforward to make the SplitComparator an instance variable of the JobClient and allow it to be set by the consumers if they care about the order in which splits are attempted to run. was: The JobClient hard codes ordering of splits in descending size. While this could be fine for traditional/batch mr jobs it is not well suited for map-only jobs where a client is interested in the order of map executions. More over, by constantly running more expensive mappers early in the job the cluster is taxed more heavily and not uniformly/smoothly utilized over time. {code} ...JobClient.java private int writeNewSplits(JobContext job, Path jobSubmitDir) throws IOException, InterruptedException, ClassNotFoundException { .... // sort the splits into order based on size, so that the biggest // go first Arrays.sort(array, new SplitComparator()); JobSplitWriter.createSplitFiles(jobSubmitDir, conf, jobSubmitDir.getFileSystem(conf), array); return array.length; } {code> It should be straightforward to make the SplitComparator an instance variable of the JobClient and allow it to be set by the consumers if they care about the order in which splits are attempted to run. > JobClient reorders splits > -------------------------- > > Key: MAPREDUCE-5085 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5085 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: ledion bitincka > > The JobClient hard codes ordering of splits in descending size. While this could be fine for traditional/batch mr jobs it is not well suited for map-only jobs where a client is interested in the order of map executions. More over, by constantly running more expensive mappers early in the job the cluster is taxed more heavily and not uniformly/smoothly utilized over time. > {code} > ...JobClient.java > private > int writeNewSplits(JobContext job, Path jobSubmitDir) throws IOException, > InterruptedException, ClassNotFoundException { > .... > // sort the splits into order based on size, so that the biggest > // go first > Arrays.sort(array, new SplitComparator()); > JobSplitWriter.createSplitFiles(jobSubmitDir, conf, jobSubmitDir.getFileSystem(conf), array); > return array.length; > } > {code} > It should be straightforward to make the SplitComparator an instance variable of the JobClient and allow it to be set by the consumers if they care about the order in which splits are attempted to run. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira