hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1143) runningMapTasks counter is not properly decremented in case of failed Tasks.
Date Mon, 16 Nov 2009 07:12:40 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778233#action_12778233
] 

Hemanth Yamijala commented on MAPREDUCE-1143:
---------------------------------------------

I spoke to Amarsri and Rahul about my comments and found out some explanations:

bq. For instance, even after this patch, I see that the number of running tasks is decremented
under different checks when a task completes and when a task fails. I assume this is for good
reason, but still it is difficult to review.

So, the different checks are as follows:

{code}
completedTask() {
  if (this tip is complete) {
    return;
  }
  update counters
}

failedTask() {
  if (any attempt was running for this tip before status update) {
    update counters
  }
}
{code}

It appears completedTask doesn't need the check for TIP being complete at all, as it can never
happen. A tip is marked complete only if atleast one attempt has completed and  remains so.
If another attempt comes in reporting success now, we fail this in status update and do not
follow the completedTask code path at all. So, for all practical purposes, counters are being
updated unconditionally in completedTask. Further, in the same code path, the task is removed
from the active tasks as well. Hence no further check is necessary.

The check in failedTask is required though. This is because a task can fail *after* it has
been marked as succeeded. For e.g. if there are fetch failures for a map, or if a tracker
is lost. In this case, we should not update counters again because they would have already
been updated when the task succeeded.

However, in this context, I am a little worried that we are checking for any attempt being
running before status update, rather than this specific attempt. At least in theory it is
possible this results in some inconsistency.

Consider this sequence of events:
- A task is scheduled
- It is speculated
- It completes -> Counters are decremented here.
- It fails (lost TT, fetch failures) -> The current patch will decrement counters here
again.
- The speculated attempt succeeds.

In practice though, this scenario may not be very likely. Apparently fetch failures and lost
TTs are the only extreme cases when this is possible. And there is considerable time lag that
can happen before a task completes and it has to be failed. The time lag will in most cases
be large enough to kill the speculative attempt as well.

With this background, is it worth changing the current patch to:

{code}
failedTask() {
  if (this task was running before status update) {
    update counters
  }
}
{code}

This seems more correct to me, but was wondering if it was worth the change. Thoughts ?

> runningMapTasks counter is not properly decremented in case of failed Tasks.
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1143
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1143
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: rahul k singh
>            Priority: Blocker
>         Attachments: MAPRED-1143-1.patch, MAPRED-1143-2.patch, MAPRED-1143-2.patch, MAPRED-1143-3.patch,
MAPRED-1143-4.patch, MAPRED-1143-ydist-1.patch, MAPRED-1143-ydist-2.patch, MAPRED-1143-ydist-3.patch,
MAPRED-1143-ydist-4.patch, MAPRED-1143-ydist-5.patch, MAPRED-1143-ydist-6.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message