hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ledion bitincka (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-5085) JobClient reorders splits
Date Wed, 20 Mar 2013 17:27:18 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ledion bitincka updated MAPREDUCE-5085:
---------------------------------------

    Description: 
The JobClient hard codes ordering of splits in descending size. While this could be fine for
traditional/batch mr jobs it is not well suited for map-only jobs where a client is interested
in the order of map executions. More over, by constantly running more expensive mappers early
in the job the cluster is taxed more heavily and not uniformly/smoothly utilized over time.


{code}
...JobClient.java
  private <T extends InputSplit>
  int writeNewSplits(JobContext job, Path jobSubmitDir) throws IOException,
      InterruptedException, ClassNotFoundException {
....
    // sort the splits into order based on size, so that the biggest
    // go first
    Arrays.sort(array, new SplitComparator());
    JobSplitWriter.createSplitFiles(jobSubmitDir, conf, jobSubmitDir.getFileSystem(conf),
array);
    return array.length;
  }
{code}

It should be straightforward to make the SplitComparator an instance variable of the JobClient
and allow it to be set by the consumers if they care about the order in which splits are attempted
to run.

  was:
The JobClient hard codes ordering of splits in descending size. While this could be fine for
traditional/batch mr jobs it is not well suited for map-only jobs where a client is interested
in the order of map executions. More over, by constantly running more expensive mappers early
in the job the cluster is taxed more heavily and not uniformly/smoothly utilized over time.


{code}
...JobClient.java
  private <T extends InputSplit>
  int writeNewSplits(JobContext job, Path jobSubmitDir) throws IOException,
      InterruptedException, ClassNotFoundException {
....
    // sort the splits into order based on size, so that the biggest
    // go first
    Arrays.sort(array, new SplitComparator());
    JobSplitWriter.createSplitFiles(jobSubmitDir, conf, jobSubmitDir.getFileSystem(conf),
array);
    return array.length;
  }
{code>

It should be straightforward to make the SplitComparator an instance variable of the JobClient
and allow it to be set by the consumers if they care about the order in which splits are attempted
to run.

    
> JobClient reorders splits 
> --------------------------
>
>                 Key: MAPREDUCE-5085
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5085
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: ledion bitincka
>
> The JobClient hard codes ordering of splits in descending size. While this could be fine
for traditional/batch mr jobs it is not well suited for map-only jobs where a client is interested
in the order of map executions. More over, by constantly running more expensive mappers early
in the job the cluster is taxed more heavily and not uniformly/smoothly utilized over time.

> {code}
> ...JobClient.java
>   private <T extends InputSplit>
>   int writeNewSplits(JobContext job, Path jobSubmitDir) throws IOException,
>       InterruptedException, ClassNotFoundException {
> ....
>     // sort the splits into order based on size, so that the biggest
>     // go first
>     Arrays.sort(array, new SplitComparator());
>     JobSplitWriter.createSplitFiles(jobSubmitDir, conf, jobSubmitDir.getFileSystem(conf),
array);
>     return array.length;
>   }
> {code}
> It should be straightforward to make the SplitComparator an instance variable of the
JobClient and allow it to be set by the consumers if they care about the order in which splits
are attempted to run.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message