hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dick King (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2039) Improve speculative execution
Date Fri, 27 Aug 2010 23:45:53 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903717#action_12903717
] 

Dick King commented on MAPREDUCE-2039:
--------------------------------------

The runtime space requirements for this will be noticeable but modest.  Each task in progress
will need a {{float}} or two for the exponentially smoothed value, plus an {{int}} for the
most recent update [needed for the exponential smoothing calculation].  Although we internally
represent times as a {{long}} , an {{int}} is enough here because the wrap-around time is
47 days.  Jobs, and therefore tasks, can't run this long for other reasons.

> Improve speculative execution
> -----------------------------
>
>                 Key: MAPREDUCE-2039
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2039
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>
> In speculation, the framework issues a second task attempt on a task where one attempt
is already running.  This is useful if the running attempt is bogged down for reasons outside
of the task's code, so a second attempt finishes ahead of the existing attempt, even though
the first attempt has a head start.
> Early versions of speculation had the weakness that an attempt that starts out well but
breaks down near the end would never get speculated.  That got fixed in HADOOP:2141 , but
in the fix the speculation wouldn't engage until the performance of the old attempt, _even
counting the early portion where it progressed normally_ , was significantly worse than average.
> I want to fix that by overweighting the more recent progress increments.  In particular,
I would like to use exponential smoothing with a lambda of approximately 1/minute [which is
the time scale of speculative execution] to measure progress per unit time.  This affects
the speculation code in two places:
>    * It affects the set of task attempts we consider to be underperforming
>    * It affects our estimates of when we expect tasks to finish.  This could be hugely
important; speculation's main benefit is that it gets a single outlier task finished earlier
than otherwise possible, and we need to know which task is the outlier as accurately as possible.
> I would like a rich suite of configuration variables, minimally including lambda and
possibly weighting factors.  We might have two exponentially smoothed tracking variables of
the progress rate, to diagnose attempts that are bogged down and getting worse vrs. bogging
down but improving.
> Perhaps we should be especially eager to speculate a second attempt.  If a task is deterministically
failing after bogging down [think "rare infinite loop bug"] we would rather take a couple
of our attempts in parallel to discover the problem sooner.
> As part of this patch we would like to add benchmarks that simulate rare tasks that behave
poorly, so we can discover whether this change in the code is a good idea and what the proper
configuration is.  Early versions of this will be driven by our assumptions.  Later versions
will be driven by the fruits of MAPREDUCE:2037

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message