hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2162) speculative execution does not handle cases where stddev > mean well
Date Tue, 02 Nov 2010 19:26:28 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12927559#action_12927559

Matei Zaharia commented on MAPREDUCE-2162:

This sounds pretty good, Joydeep. Just wanted to make one comment: even if we use, say, then
90th percentile to determine the "normal" range for tasks, it may not mean that we'll speculate
10% of tasks all the time. The reason for this is the other speculation criteria -- speculative
lag and R < M. In particular, if we let each task run for some amount of time (say half
the average task length) before deciding on whether to speculate it, then it's likely that
even for many of these 90th percentile tasks, R will be less than M so we will not want to
speculate. This might be enough to get a good speculation algorithm (that only speculates
a few percent of the tasks and almost always picks ones that will beat their original tasks)
without having to do the cost-benefit estimation at first.

> speculative execution does not handle cases where stddev > mean well
> --------------------------------------------------------------------
>                 Key: MAPREDUCE-2162
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2162
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Joydeep Sen Sarma
> the new speculation code only speculates tasks whose progress rate deviates from the
mean progress rate of a job by more than some multiple (typically 1.0) of stddev. stddev can
be larger than mean. which means that if we ever get into a situation where this condition
holds true - then a task with even 0 progress rate will not be speculated.
> it's not clear that this condition is self-correcting. if a job has thousands of tasks
- then one laggard task, inspite of not being speculated for a long time, may not be able
to fix the condition of stddev > mean.
> we have seen jobs where tasks have not been speculated for hours and this seems one explanation
why this may have happened. here's an example job with stddev > mean:
> DataStatistics: count is 6, sum is 1.7141054797775723E-8, sumSquares is 2.9381575958035014E-16
mean is 2.8568424662959537E-9 std() is 6.388093955645905E-9

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message