hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2162) speculative execution does not handle cases where stddev > mean well
Date Wed, 01 Dec 2010 09:17:36 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965624#action_12965624
] 

Joydeep Sen Sarma commented on MAPREDUCE-2162:
----------------------------------------------

spent a lot of time coding and thinking about this. i am more to make a simple change to cap
the standardDeviation at some maximum value (say Mean/3).

i did a detailed analysis that seems to suggest that doing so would be roughly equivalent
to the scheme discussed above. we already have the notion of a 'speculative cap' - putting
a speculative cap of 10% of the currently running tasks would be roughly equivalent of speculating
the bottom 10%. (The LateComparator currently sorts speculatable tasks by remaining time (instead
of progress rate). if it were to sort based on progress rate - it would be very similar to
speculating the bottom 10%)

the conditions discussed here (runningTime >= mean/2 and remainingTime<mean) are roughly
equal to the current code (modulo the LateComparator) if stddev is capped at mean/3 (it's
a somewhat long deduction).

> speculative execution does not handle cases where stddev > mean well
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2162
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2162
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>
> the new speculation code only speculates tasks whose progress rate deviates from the
mean progress rate of a job by more than some multiple (typically 1.0) of stddev. stddev can
be larger than mean. which means that if we ever get into a situation where this condition
holds true - then a task with even 0 progress rate will not be speculated.
> it's not clear that this condition is self-correcting. if a job has thousands of tasks
- then one laggard task, inspite of not being speculated for a long time, may not be able
to fix the condition of stddev > mean.
> we have seen jobs where tasks have not been speculated for hours and this seems one explanation
why this may have happened. here's an example job with stddev > mean:
> DataStatistics: count is 6, sum is 1.7141054797775723E-8, sumSquares is 2.9381575958035014E-16
mean is 2.8568424662959537E-9 std() is 6.388093955645905E-9

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message