hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Kambatla (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5110) Long task launch delays can lead to multiple parallel attempts of the task
Date Tue, 16 Apr 2013 00:04:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632397#comment-13632397
] 

Karthik Kambatla commented on MAPREDUCE-5110:
---------------------------------------------

Thanks for your response, Arun.

Let me take a step back and explain in detail:

AIC the issue this JIRA address is - "Where possible (i.e., not transient network partitions),
run a single task attempt for a task when speculation is turned off". A JT solution (a.k.a
MAPREDUCE-2217) spawns another task attempt, but doesn't kill the currently running task before
doing so. Through a TT-side solution (patch here), one will be able to kill the currently
running attempt first before spawning another task attempt.

I see your point of avoid-TT-changes-if-possible. I guess the trade-off is between marginal
increase in TT code complexity (a timeout check and logging changes) and running multiple
attempts of the task. Given the low cost of the fix, I believe we should address this scenario
which seems to be far more frequent compared to network partitions.
                
> Long task launch delays can lead to multiple parallel attempts of the task
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5110
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5110
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 1.1.2
>            Reporter: Karthik Kambatla
>            Assignee: Karthik Kambatla
>         Attachments: expose-mr-5110.patch, mr-5110.patch, mr-5110.patch, mr-5110-tt-only.patch
>
>
> If a task takes too long to launch, the JT expires the task and schedules another attempt.
The earlier attempt can start after the later attempt leading to two parallel attempts running
at the same time. This is particularly an issue if the user turns off speculation and expects
a single attempt of a task to run at any point in time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message