Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 88558 invoked from network); 11 Jun 2007 20:40:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Jun 2007 20:40:57 -0000 Received: (qmail 41270 invoked by uid 500); 11 Jun 2007 20:40:52 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 41236 invoked by uid 500); 11 Jun 2007 20:40:52 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 41211 invoked by uid 99); 11 Jun 2007 20:40:52 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jun 2007 13:40:52 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jun 2007 13:40:47 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id D41FB4297CB for ; Mon, 11 Jun 2007 13:40:26 -0700 (PDT) Message-ID: <2899592.1181594426866.JavaMail.jira@brutus> Date: Mon, 11 Jun 2007 13:40:26 -0700 (PDT) From: "Owen O'Malley (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1440) JobClient should not sort input-splits In-Reply-To: <15655078.1180479735895.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503609 ] Owen O'Malley commented on HADOOP-1440: --------------------------------------- It is just an easier explanation to users, if the first map returned from getSplits is map-0, the second is map-1, and so on. The problem from my point of view is just that right now the name of the task controls the scheduling of the task. They should be independent of each other. > JobClient should not sort input-splits > -------------------------------------- > > Key: HADOOP-1440 > URL: https://issues.apache.org/jira/browse/HADOOP-1440 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Affects Versions: 0.12.3 > Environment: All > Reporter: Milind Bhandarkar > Assignee: Milind Bhandarkar > Fix For: 0.14.0 > > > Currently, the JobClient sorts the InputSplits returned by InputFormat in descending order, so that the map tasks corresponding to larger input-splits are scheduled first for execution than smaller ones. However, this causes problems in applications that produce data-sets partitioned similarly to the input partition with -reducer NONE. > With -reducer NONE, map task i produces part-i. Howver, in the typical applications that use -reducer NONE it should produce a partition that has the same index as the input parrtition. > (Of course, this requires that each partition should be fed in its entirety to a map, rather than splitting it into blocks, but that is a separate issue.) > Thus, sorting input splits should be either controllable via a configuration variable, or the FileInputFormat should sort the splits and JobClient should honor the order of splits. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.