hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eric baldeschwieler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1440) JobClient should not sort input-splits
Date Sat, 09 Jun 2007 05:40:26 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12503017
] 

eric baldeschwieler commented on HADOOP-1440:
---------------------------------------------

I tend to agree with runping.  The framework should reserve the right to execute maps in any
order it chooses to.  Nailing down execution order will limit our ability to optimize later.
 Also, why put sorting into user code?

It sounds like the need is to name the reduces, not control their order.  So why not address
that directly?  Perhaps outputs can be numbered according to their original submission order
in the case of reducer none?  This need not  pin down execution order.

Sounds like perhaps we should deprecate map.input.file now that a more uniform mechanism exists
to get this info?

> JobClient should not sort input-splits
> --------------------------------------
>
>                 Key: HADOOP-1440
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1440
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>         Environment: All
>            Reporter: Milind Bhandarkar
>            Assignee: Milind Bhandarkar
>             Fix For: 0.14.0
>
>
> Currently, the JobClient sorts the InputSplits returned by InputFormat in descending
order, so that the map tasks corresponding to larger input-splits are scheduled first for
execution than smaller ones. However, this causes problems in applications that produce data-sets
partitioned similarly to the input partition with -reducer NONE.
> With -reducer NONE, map task i produces part-i. Howver, in the typical applications that
use -reducer NONE it should produce a partition that has the same index as the input parrtition.
> (Of course, this requires that each partition should be fed in its entirety to a map,
rather than splitting it into blocks, but that is a separate issue.)
> Thus, sorting input splits should be either controllable via a configuration variable,
or the FileInputFormat should sort the splits and JobClient should honor the order of splits.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message