hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dick King (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2039) Improve speculative execution
Date Wed, 08 Sep 2010 22:45:35 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12907469#action_12907469
] 

Dick King commented on MAPREDUCE-2039:
--------------------------------------

I would like to open discussion on being especially eager to speculate tasks that have already
failed, at least in cases where the job's acceptable task failure proportion is set so that
one more failed tasks of this kind will sink the job.  The intuition is that ejecting jobs
that will not succeed as soon as possible is a Good Thing.

A case can be made for a policy of _always_ starting the second and third and even the _fourth_
task attempts simultaneously.  With most kinds of sporadic failure [which is rare], we would
waste few resources -- it just wouldn't happen very often -- and we would probably gain a
fair amount for the "killer data record" case.  However, such a policy would work spectacularly
badly when there's a bad node that fails all of its tasks.

Here are the questions I would like to answer.

   * Should we speculate already-failed-once more eagerly than ordinary tasks at all?
   * What form should this eagerness take?

Possibilities, from less to more aggressive, are:

   * Speculate normally
   * Speculate only when the task is underperforming, but use a more aggressive estimate as
to how long the task will take to finish
   * Use a more aggressive estimate as to how long the task will take to finish, which will
make a task that is not underperforming appear to be underperforming
   * As above, and consider speculating a task even if the job has other tasks it could be
doing

I will begin this discussion by pointing out that most failures are either user code bugs
or problems with a node, and that they're comparatively rare.  I conjecture that even if we
always run the second and third copy of a failed task simultaneously, we would waste few resources
-- it just wouldn't happen very often -- and we would probably gain a fair amount.  I will,
however, point out that if we adjust aggressive failed job speculation in such a manner that
most of the failed jobs get speculated, a bad node that fails all of its tasks will effectively
cause additional computation power to be unavailable because repeated runs will be duplicated.
 A bad node whose problem induces most tasks to fail quickly can spin off failures rapidly
and leave a large number of duplicated tasks in its wake.

> Improve speculative execution
> -----------------------------
>
>                 Key: MAPREDUCE-2039
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2039
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Dick King
>            Assignee: Dick King
>
> In speculation, the framework issues a second task attempt on a task where one attempt
is already running.  This is useful if the running attempt is bogged down for reasons outside
of the task's code, so a second attempt finishes ahead of the existing attempt, even though
the first attempt has a head start.
> Early versions of speculation had the weakness that an attempt that starts out well but
breaks down near the end would never get speculated.  That got fixed in HADOOP:2141 , but
in the fix the speculation wouldn't engage until the performance of the old attempt, _even
counting the early portion where it progressed normally_ , was significantly worse than average.
> I want to fix that by overweighting the more recent progress increments.  In particular,
I would like to use exponential smoothing with a lambda of approximately 1/minute [which is
the time scale of speculative execution] to measure progress per unit time.  This affects
the speculation code in two places:
>    * It affects the set of task attempts we consider to be underperforming
>    * It affects our estimates of when we expect tasks to finish.  This could be hugely
important; speculation's main benefit is that it gets a single outlier task finished earlier
than otherwise possible, and we need to know which task is the outlier as accurately as possible.
> I would like a rich suite of configuration variables, minimally including lambda and
possibly weighting factors.  We might have two exponentially smoothed tracking variables of
the progress rate, to diagnose attempts that are bogged down and getting worse vrs. bogging
down but improving.
> Perhaps we should be especially eager to speculate a second attempt.  If a task is deterministically
failing after bogging down [think "rare infinite loop bug"] we would rather take a couple
of our attempts in parallel to discover the problem sooner.
> As part of this patch we would like to add benchmarks that simulate rare tasks that behave
poorly, so we can discover whether this change in the code is a good idea and what the proper
configuration is.  Early versions of this will be driven by our assumptions.  Later versions
will be driven by the fruits of MAPREDUCE:2037

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message