hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chackaravarthy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4252) MR2 job never completes with 1 pending task
Date Wed, 29 Aug 2012 09:02:08 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13443920#comment-13443920

chackaravarthy commented on MAPREDUCE-4252:

Hi Tom,

This problem has been handled when speculative task launched for map task and other attempt
got failed (not killed)
Can the similar kind of scenario can happen in case of reduce task?

Consider the following scenario for reduce task in case of speculation (one attempt got killed):
1. A task attempt is started.
2. A speculative task attempt for the same task is started.
3. The first task attempt completes and causes the task to transition to SUCCEEDED.
4. Then speculative task attempt will be killed because of the completion of first attempt.

As a result, internal error will be thrown from this attempt (*TaskImpl.MapRetroactiveKilledTransition*)
and hence task attempt failure leads to job failure.

if (!TaskType.MAP.equals(task.getType())) {
        LOG.error("Unexpected event for REDUCE task " + event.getType());

So, do we need to have following code in MapRetroactiveKilledTransition also just like in

if (event instanceof TaskTAttemptEvent) {
        TaskTAttemptEvent castEvent = (TaskTAttemptEvent) event;
        if (task.getState() == TaskState.SUCCEEDED &&
            !castEvent.getTaskAttemptID().equals(task.successfulAttempt)) {
          // don't allow a different task attempt to override a previous
          // succeeded state
          return TaskState.SUCCEEDED;

please check whether this is a valid case and give your suggestion.

> MR2 job never completes with 1 pending task
> -------------------------------------------
>                 Key: MAPREDUCE-4252
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4252
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.1
>            Reporter: Tom White
>            Assignee: Tom White
>             Fix For: 0.23.3, 2.1.0-alpha, 3.0.0
>         Attachments: MAPREDUCE-4252.patch, MAPREDUCE-4252.patch, MAPREDUCE-4252.patch,
MAPREDUCE-4252.patch, MapReduce.png
> This was found by ATM:
> bq. I ran a teragen with 1000 map tasks. Many task attempts failed, but after 999 of
the tasks had completed, the job is now sitting forever with 1 task "pending".

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message