hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2162) speculative execution does not handle cases where stddev > mean well
Date Thu, 02 Dec 2010 00:24:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965936#action_12965936

Joydeep Sen Sarma commented on MAPREDUCE-2162:

here's the reasoning behind capping stddev at mean/3. we speculate if:
* rate < mean - stddev

* 1/rate > 1/(mean - stddev)

* 1/rate > 1/mean + (1/(mean - stddev) - 1/mean)

# projectedTime > meanTime + Delta

* Delta = (1/(mean - stddev) - 1/mean)

* stddev <= mean/3 // for example

* Delta > (1/(mean - mean/3) - 1/mean) ==>
* Delta > 0.5/mean = 0.5 * MeanTime

now our our equation _1_ becomes:
# projectedTime > MeanTime + 0.5*MeanTime

two observations:

* by capping stddev - we have converted the rate check into a meaningful check on the running
time of a task - tasks that run longer than a certain time (relative to the mean) will be
guaranteed to be speculated.
* the Meantime + 0.5*Meantime slack over the mean is same as the heuristic discussed in the
jira where two rules were discussed:
** dont speculate if runningTime <= MeanTime * 0.5
** dont speculate if remainingTime < MeanTime
* if we add these two together - since runningTime + remainingTime == projectedTime - this
becomes (roughly): 
** speculate only if projectedTime > MeanTime + MeanTime*0.5

so the heuristics in the jira are structurally similar to capping the stddev at mean/3.

as explained earlier - the percentile stuff is actually (approximately) being done by speculativeCap
(no more than 10% of the tasks can be speculated and tasks are sorted (by latest finish time)
before speculating).

> speculative execution does not handle cases where stddev > mean well
> --------------------------------------------------------------------
>                 Key: MAPREDUCE-2162
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2162
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
> the new speculation code only speculates tasks whose progress rate deviates from the
mean progress rate of a job by more than some multiple (typically 1.0) of stddev. stddev can
be larger than mean. which means that if we ever get into a situation where this condition
holds true - then a task with even 0 progress rate will not be speculated.
> it's not clear that this condition is self-correcting. if a job has thousands of tasks
- then one laggard task, inspite of not being speculated for a long time, may not be able
to fix the condition of stddev > mean.
> we have seen jobs where tasks have not been speculated for hours and this seems one explanation
why this may have happened. here's an example job with stddev > mean:
> DataStatistics: count is 6, sum is 1.7141054797775723E-8, sumSquares is 2.9381575958035014E-16
mean is 2.8568424662959537E-9 std() is 6.388093955645905E-9

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message