hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matei Zaharia (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2162) speculative execution does not handle cases where stddev > mean well
Date Thu, 28 Oct 2010 05:02:06 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925692#action_12925692
] 

Matei Zaharia commented on MAPREDUCE-2162:
------------------------------------------

This is interesting. The original LATE algorithm actually used a percentile (for example,
the 25th percentile) instead of a standard deviation to determine tasks that are "slow enough".
That guarantees that there are always tasks that are considered "slow enough". Maybe we should
go back to that. I believe that Arun and Andy changed it to the current approach in HADOOP-2141
because they thought that the stddev would be more robust and they were afraid that computing
the 25th percentile takes time. However, it doesn't take that long and it only needs to be
computed periodically.

> speculative execution does not handle cases where stddev > mean well
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2162
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2162
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Joydeep Sen Sarma
>
> the new speculation code only speculates tasks whose progress rate deviates from the
mean progress rate of a job by more than some multiple (typically 1.0) of stddev. stddev can
be larger than mean. which means that if we ever get into a situation where this condition
holds true - then a task with even 0 progress rate will not be speculated.
> it's not clear that this condition is self-correcting. if a job has thousands of tasks
- then one laggard task, inspite of not being speculated for a long time, may not be able
to fix the condition of stddev > mean.
> we have seen jobs where tasks have not been speculated for hours and this seems one explanation
why this may have happened. here's an example job with stddev > mean:
> DataStatistics: count is 6, sum is 1.7141054797775723E-8, sumSquares is 2.9381575958035014E-16
mean is 2.8568424662959537E-9 std() is 6.388093955645905E-9

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message