hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2141) speculative execution start up condition based on completion time
Date Thu, 15 Nov 2007 19:43:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542861
] 

Runping Qi commented on HADOOP-2141:
------------------------------------


The above proposal sounds reasonable.
Here are some points to consider:

1. A speculative execution for a mapper (reducer) is started 
only if there are no pending  non-speculative mappers (reducers)

2. We should estimate the expected finish time for a mapper(reducer) based on its 
current progression state and progression rate. 
A speculative execution for a mapper (reducer) is starte only if  the projected finish time
is far away than the average execution time of mappers(reducers)

3. It is a bit treaky to compute the average execution of reducers.
If a reducer started before the map phase completed, then the overalp period should \
be taken out.

4. If a reducer is stucked at shuffling state, the real reason for the stall may be related
to the machine(s)
where the needed map outputs sit. Launching a speculative execution of the reducer may not
help.
In this case, we may need to declare the concerned mappers are gone and re-run them.






> speculative execution start up condition based on completion time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2141
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Koji Noguchi
>            Assignee: Arun C Murthy
>             Fix For: 0.16.0
>
>
> We had one job with speculative execution hang.
> 4 reduce tasks were stuck with 95% completion because of a bad disk. 
> Devaraj pointed out 
> bq . One of the conditions that must be met for launching a speculative instance of a
task is that it must be at least 20% behind the average progress, and this is not true here.
> It would be nice if speculative execution also starts up when tasks stop making progress.
> Devaraj suggested 
> bq. Maybe, we should introduce a condition for average completion time for tasks in the
speculative execution check. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message