hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Konwinski (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2141) speculative execution start up condition based on completion time
Date Fri, 27 Mar 2009 01:50:51 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andy Konwinski updated HADOOP-2141:
-----------------------------------

    Affects Version/s:     (was: 0.19.0)
                       0.21.0
               Status: Patch Available  (was: Open)

Responding to Devaraj's comments:

"The field TaskInProgress.mostRecentStartTime is updated with the same value of execStartTime
each time (since execStartTime is updated only once in the life of the TIP). Did you mean
to do this?"

No, good catch. mostRecentStartTime should be updated with the current time each time getTaskToRun
is called. I have made this change.

"They should be decremented in TIP.incompleteSubTask and TIP.completedTask (basically, places
where activeTasks.remove) is done. The decrement should happen if activeTasks.size for the
TIP is >1. Makes sense?"

Thanks to Devaraj for writing the decrementSpeculativeCount() function, which is called from
failedTask() and completedTask(). I have replaced the countSpeculating() function call in
atSpeculativeCap() with the sum of speculativeMapTasks+speculativeReduceTasks. 

"Couldn't it be checked whether TIP.isComplete() returns true before launching a speculative
attempt?"

Yes, I think this could be done as an optimization. It would add a little bit of complexity
though and before making too many more changes maybe it would be good to test the current
functionality. Again, it would be nice if we could get a few people to test the performance
impact of this patch at scale.

> speculative execution start up condition based on completion time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2141
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Koji Noguchi
>            Assignee: Andy Konwinski
>         Attachments: 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, HADOOP-2141-v4.patch,
HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, HADOOP-2141.patch
>
>
> We had one job with speculative execution hang.
> 4 reduce tasks were stuck with 95% completion because of a bad disk. 
> Devaraj pointed out 
> bq . One of the conditions that must be met for launching a speculative instance of a
task is that it must be at least 20% behind the average progress, and this is not true here.
> It would be nice if speculative execution also starts up when tasks stop making progress.
> Devaraj suggested 
> bq. Maybe, we should introduce a condition for average completion time for tasks in the
speculative execution check. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message