hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2141) speculative execution start up condition based on completion time
Date Thu, 15 Nov 2007 20:14:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542869
] 

Arun C Murthy commented on HADOOP-2141:
---------------------------------------

Thanks for your comments Runping, some thoughts of my own...

----

bq.1. A speculative execution for a mapper (reducer) is started only if there are no pending
non-speculative mappers (reducers)

I believe this is already the case for choosing speculative tasks... I'll double-check.


bq. 2. We should estimate the expected finish time for a mapper(reducer) based on its current
progression state and progression rate. A speculative execution for a mapper (reducer) is
starte only if the projected finish time is far away than the average execution time of mappers(reducers)

Hmm... I'm concerned this could lead to some aggressively spawned reduce tasks in cases that
Koji reported. Do you see a way to do this more conservatively and yet keep it simple?


bq. 3. It is a bit treaky to compute the average execution of reducers. If a reducer started
before the map phase completed, then the overalp period should be taken out.

Ok, I agree in principle. Yet I'm concerned about whether this is an over-kill. 
We could subtract the time it took all mappers to finish... I'm not very sure.


bq. 4. If a reducer is stucked at shuffling state, the real reason for the stall may be related
to the machine(s) where the needed map outputs sit. Launching a speculative execution of the
reducer may not help. In this case, we may need to declare the concerned mappers are gone
and re-run them.

I'm hoping HADOOP-1128, and more recently HADOOP-1984, take care of this; as long as we aren't
too aggressive about starting speculative reduces.

----

Overall, I'm very concerned about keeping this reasonably simple, atleast as a first-pass,
till we have a chance to see this in action in the real-world. We can then iterate...


> speculative execution start up condition based on completion time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2141
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Koji Noguchi
>            Assignee: Arun C Murthy
>             Fix For: 0.16.0
>
>
> We had one job with speculative execution hang.
> 4 reduce tasks were stuck with 95% completion because of a bad disk. 
> Devaraj pointed out 
> bq . One of the conditions that must be met for launching a speculative instance of a
task is that it must be at least 20% behind the average progress, and this is not true here.
> It would be nice if speculative execution also starts up when tasks stop making progress.
> Devaraj suggested 
> bq. Maybe, we should introduce a condition for average completion time for tasks in the
speculative execution check. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message