hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2141) speculative execution start up condition based on completion time
Date Thu, 15 Nov 2007 19:57:43 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12542867
] 

Arun C Murthy commented on HADOOP-2141:
---------------------------------------

bq. Is there a strong reason for disabling it for maps ?

To me it looks like there is strong case for atleast treating maps and reduces separately,
if not disabling for maps.

Mostly maps run in the order of minutes, while reduces take much more time since they are
waiting for maps to complete. 
E.g In sort500, maps take about a minute (outlier maps take 4mins) while reduces (atleast
the first wave) take around an hour to complete.

Given these (and I know there are wildly varying job characteristics) I'd like to be careful
and ensure we aren't too aggressive while launching speculative, speculative-tasks (man! does
that sound weird!).

Hence, and keeping in mind that reduces are more expensive to execute willy-nilly speculatively
I propose 4 parameters, to keep it reasonably simple:

mapred.map.speculative.timegap = 2 x avg_map_completion_time
mapred.reduce.speculative.timegap = 1.5 x avg_reduce_completion_time
mapred.map.min.completion.for.speculation = 90
mapred.reduce.min.completion.for.speculation = 95

Thoughts?

> speculative execution start up condition based on completion time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2141
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.15.0
>            Reporter: Koji Noguchi
>            Assignee: Arun C Murthy
>             Fix For: 0.16.0
>
>
> We had one job with speculative execution hang.
> 4 reduce tasks were stuck with 95% completion because of a bad disk. 
> Devaraj pointed out 
> bq . One of the conditions that must be met for launching a speculative instance of a
task is that it must be at least 20% behind the average progress, and this is not true here.
> It would be nice if speculative execution also starts up when tasks stop making progress.
> Devaraj suggested 
> bq. Maybe, we should introduce a condition for average completion time for tasks in the
speculative execution check. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message