hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-1440) JobClient should not sort input-splits
Date Tue, 29 May 2007 23:02:15 GMT
JobClient should not sort input-splits

                 Key: HADOOP-1440
                 URL: https://issues.apache.org/jira/browse/HADOOP-1440
             Project: Hadoop
          Issue Type: Improvement
          Components: mapred
    Affects Versions: 0.12.3
         Environment: All
            Reporter: Milind Bhandarkar
             Fix For: 0.14.0

Currently, the JobClient sorts the InputSplits returned by InputFormat in descending order,
so that the map tasks corresponding to larger input-splits are scheduled first for execution
than smaller ones. However, this causes problems in applications that produce data-sets partitioned
similarly to the input partition with -reducer NONE.

With -reducer NONE, map task i produces part-i. Howver, in the typical applications that use
-reducer NONE it should produce a partition that has the same index as the input parrtition.

(Of course, this requires that each partition should be fed in its entirety to a map, rather
than splitting it into blocks, but that is a separate issue.)

Thus, sorting input splits should be either controllable via a configuration variable, or
the FileInputFormat should sort the splits and JobClient should honor the order of splits.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message