hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-718) Support for per-phase speculative execution
Date Thu, 13 Aug 2009 08:39:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742766#action_12742766

Devaraj Das commented on MAPREDUCE-718:

The current speculative execution heuristic assumes inputs are homogeneous. This is not always
true. Inputs are often skewed in practical cases. AS of today, we might end up launching speculative
tasks for the ones that are processing large inputs. We should base the criteria of choosing
what task to speculate on something that reflects the actual progress and that doesn't get
affected by the skewness in the input. 
One thought there is to base the choice of speculative candidates on the rate of increase
of the relevant System/Framework Counters in each task phase. For example, for tasks in the
map phase, we can monitor the rate of increase of MAP_INPUT_RECORDS. For reduce tasks in the
shuffle phase, we look at SHUFFLE_BYTES counter and so on. We also need to maintain the per-phase
statistics for the counters, and consider the current phase of a given task for considering
it for speculation.

Today, the task counters is transmitted once a minute from the tasktrackers. I propose that
for the system counters we send the counters in every heartbeat. The communication of the
system counters information can be made efficient by having a customized serialization for
them. Since the system counter names are known to both the JobTracker and the TaskTrackers,
the order in which they are serialized/deserialized is a good enough indicator of which particular
counter we are talking about. Also, the counter values can be serialized as vInts.

@Zhang, for speculative execution, we don't migrate tasks.

> Support for per-phase speculative execution
> -------------------------------------------
>                 Key: MAPREDUCE-718
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-718
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.21.0
>            Reporter: Devaraj Das
>             Fix For: 0.21.0
> It would be good to have support for per-phase speculative execution where the algorithm
looks at the current phase of a task, and compares with the other tasks in the same phase
before deciding to launch a speculative task. That would have the following benefits:
> 1) Support for jobs where map tasks progresses jumps from 0% to 100%. This is true for
some jobs like randomwriter. Today, we would launch speculative tasks for such jobs (assuming
that the tasks are not making progress). But most of them would be unnecessary. 
> 2) In reality, for reduces, the three phases are quite different from each other, and
they take different times too. We should see better results when we look at per-phase speculation.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message