hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2141) speculative execution start up condition based on completion time
Date Tue, 16 Sep 2008 08:45:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631301#action_12631301
] 

Arun C Murthy commented on HADOOP-2141:
---------------------------------------

Looking through this patch, a few comments:

# JobInProgress.getSpeculative{Map|Reduce} are both called from synchronized methods i.e.
JobInProgress.findNew{Map|Reduce}Task; hence please mark these as synchronized too, just to
be future-proof.
# JobInProgress.findSpeculativeTask's 'shouldRemove' parameter is always passed in as 'false'
(from getSpeculative{Map|Reduce}) ... do we even need this parameter?
# JobInProgress.isTaskSlowEnoughToSpeculate gets mapred.speculative.execution.slowTaskThreshold
from the JobConf always - we should just cache that in a private variable. Ditto for JobInProgress.isSlowTracker/mapred.speculative.execution.slowNodeThreshold
and JobInProgress.atSpeculativeCap/mapred.speculative.execution.speculativeCap. (Also please
remove the LOG.info for the config variable in JobInProgress.isTaskSlowEnoughToSpeculate).
# JobInProgress.findSpeculativeTask gets a List of TIPs, it then proceeds to convert that
to an TIP[] for JobInProgress.isSlowTracker etc. - we should just get all apis to work with
List<TIP> and do away with that conversion.
# Can we keep a running count of 'progress' of TaskTrackers' tasks rather than recompute them
each time in JobInProgress.isSlowTracker? For large jobs it might be significant...
# JobInProgress.isTaskSlowEnoughToSpeculate really bothers me. It is called from inside a
loop (i.e. for each TIP) and it sorts the progress of each TIP. This is potentially very expensive.
At the very least we should sort the the TIPs once and even better - we should maintain a
PriorityQueue of TIPs based on their progress.
# I'm guessing that sorting 'candidate speculative tasks' in JobInProgress.findSpeculativeTask
isn't prohibitively expensive since the number of candidates is fairly small, could you please
confirm?
# Minor: Please adhere to the 80 character limit per-line.

> speculative execution start up condition based on completion time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2141
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Koji Noguchi
>            Assignee: Andy Konwinski
>         Attachments: HADOOP-2141-v2.patch, HADOOP-2141.patch
>
>
> We had one job with speculative execution hang.
> 4 reduce tasks were stuck with 95% completion because of a bad disk. 
> Devaraj pointed out 
> bq . One of the conditions that must be met for launching a speculative instance of a
task is that it must be at least 20% behind the average progress, and this is not true here.
> It would be nice if speculative execution also starts up when tasks stop making progress.
> Devaraj suggested 
> bq. Maybe, we should introduce a condition for average completion time for tasks in the
speculative execution check. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message