hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1440) JobClient should not sort input-splits
Date Tue, 29 May 2007 23:02:15 GMT
JobClient should not sort input-splits
--------------------------------------

                 Key: HADOOP-1440
                 URL: https://issues.apache.org/jira/browse/HADOOP-1440
             Project: Hadoop
          Issue Type: Improvement
          Components: mapred
    Affects Versions: 0.12.3
         Environment: All
            Reporter: Milind Bhandarkar
             Fix For: 0.14.0


Currently, the JobClient sorts the InputSplits returned by InputFormat in descending order,
so that the map tasks corresponding to larger input-splits are scheduled first for execution
than smaller ones. However, this causes problems in applications that produce data-sets partitioned
similarly to the input partition with -reducer NONE.

With -reducer NONE, map task i produces part-i. Howver, in the typical applications that use
-reducer NONE it should produce a partition that has the same index as the input parrtition.

(Of course, this requires that each partition should be fed in its entirety to a map, rather
than splitting it into blocks, but that is a separate issue.)

Thus, sorting input splits should be either controllable via a configuration variable, or
the FileInputFormat should sort the splits and JobClient should honor the order of splits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message