hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Devaraj Das (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2141) speculative execution start up condition based on completion time
Date Mon, 27 Apr 2009 18:20:30 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703265#action_12703265
] 

Devaraj Das commented on HADOOP-2141:
-------------------------------------

Eric, I am not sure how much of a good gain would the locality aspect be. Considering that
we will have very few tasks that we launch speculatively, the probability that a TT comes
and gets a data local spec task would be quite low IMO. But yes, it makes sense to keep the
existing logic for running node/rack local speculative task around.. So I'd suggest something
like:
if (TT is not slow) {
  if (exists node-local task that is running slower than others) {
     assign that task to the TT
  } else {
     assign some task from the higher level rack-cache if available; else look at the entire
list of running TIPs to find a slow task
  }
}
The above is essentially the same as what happens in today's trunk. The only additional constraint
we are adding here is the check for whether a TT is GOOD (meets the criteria for running spec
tasks).


> speculative execution start up condition based on completion time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2141
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Koji Noguchi
>            Assignee: Andy Konwinski
>         Attachments: 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, HADOOP-2141-v4.patch,
HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, HADOOP-2141.patch
>
>
> We had one job with speculative execution hang.
> 4 reduce tasks were stuck with 95% completion because of a bad disk. 
> Devaraj pointed out 
> bq . One of the conditions that must be met for launching a speculative instance of a
task is that it must be at least 20% behind the average progress, and this is not true here.
> It would be nice if speculative execution also starts up when tasks stop making progress.
> Devaraj suggested 
> bq. Maybe, we should introduce a condition for average completion time for tasks in the
speculative execution check. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message