hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmed Hussein (Jira)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-7208) Tuning TaskRuntimeEstimator
Date Tue, 29 Oct 2019 20:28:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962437#comment-16962437

Ahmed Hussein commented on MAPREDUCE-7208:

Thanks [~jeagles]. I looked at the test cases:
* {{hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp}} is a related test case and It
was failing because I changed the threshold of the estimate that triggers a new speculative
task. I fixed that default behavior in the new patch.
* {{hadoop.mapred.TestLocalMRNotification}} and {{hadoop.mapreduce.v2.TestMROldApiJobs}} seem
to be a random failure. They pass successfully on local machine.

> Tuning TaskRuntimeEstimator 
> ----------------------------
>                 Key: MAPREDUCE-7208
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Minor
>         Attachments: MAPREDUCE-7208.001.patch, MAPREDUCE-7208.002.patch, smoothing-exponential.md
> By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the runtime.  The
estimator does not adjust dynamically to the progress rate of the tasks. On the other hand,
the existing alternative "ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.
> There are several dimensions to improve the exponential implementation:
>  # Exponential shooting needs a warmup period. Otherwise, the estimate will be affected
by the initial values.
>  # Using a single smoothing factor (Lambda) does not work well for all the tasks. To
increase the level of smoothing across the majority of tasks, we need to give a range of flexibility
to dynamically adjust the smoothing factor based on the history of the task progress.
>  # Design wise, it is better to separate between the statistical model and the MR interface. We
need to have a way to evaluate estimators statistically, without the need to run MR. For example,
an estimator can be evaluated as a black box by using a stream of raw data as input and testing
the accuracy of the generated stream of estimates.
>  # The exponential estimator speculates frequently and fails to detect slowing tasks.
It does not detect slowing tasks. As a result, a taskAttempt that does not do any progress
won't trigger a new speculation.
> The file [^smoothing-exponential.md] describes how Simple Exponential smoothing factor

This message was sent by Atlassian Jira

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message