hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Kramer (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1608) Allow users to do speculative execution of a task manually
Date Thu, 11 Nov 2010 00:45:15 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930867#action_12930867

Adam Kramer commented on MAPREDUCE-1608:

As a user, I have found the ability to manually speculate tasks via the website incredibly
useful--so useful that I'm starting to worry about RSI given that each speculation takes a
click to the task page, a click to the task, a click on speculate, and a click on the confirm
dialog box. These are frequently lost-task-tracker failures, and Hadoop currently just sets
a timeout on them.

But how am I beating the current system? I'm comparing some tasks' performance to other tasks
in the same job:

1) If there is only one task (either map or reduce) always speculate. Maybe turn this off
for clusters that have very few slots, but in the case of >1000 slots or so, this is trivial
and would basically prevent jobs taking literally twice as long.

2) Collect data on other tasks in the same job. If 99% of mappers went from 0% complete to
>0% complete in 5 seconds and it's been 5 minutes while the last 5% of mappers change,
speculate them. Ditto reducers. Unbalanced data may cause these problems, 

3) Collect data on delays. If a task doesn't improve its % complete in some timeframe determined
by the other tasks for the same job, speculate the "hung" task.

...in other words, I agree that there is probably an easy way to model the failed tasks, but
only from a modeling perspective. Getting the heuristics and models right and implementing
them is probably much much more difficult than implemeting "hadoop job -speculate-task task_identifier_here."

But also, and implementing the latter is *necessary* to discover how and when the heuristics
themselves are failing...giving users the ability to do this also gives admins the ability
to see when users are doing this.

> Allow users to do speculative execution of a task manually
> ----------------------------------------------------------
>                 Key: MAPREDUCE-1608
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1608
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Scott Chen
>            Assignee: Scott Chen
> Speculative execution improves the latency of the job. Sometimes the job has few very
slow reducers. Spending a little more resource on speculative tasks can improve the latency
a lot. It will be nice that the users can manually select one task and force the speculative
execution on that task just like we can manually kill/fail task.
> The proposal is add link says "speculate" in taskdetails.jsp page where we do "kill/fail".
> Thoughts? 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message