hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4788) Job are marking as FAILED even if there are no failed tasks in it
Date Wed, 16 Dec 2015 16:15:46 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15060228#comment-15060228
] 

Jason Lowe commented on MAPREDUCE-4788:
---------------------------------------

I'm confused as to why increasing the sleep period is an appropriate fix for this.  Even if
the AM doesn't stick around the job client should be redirected to the history server if the
AM has already exited.  Is the job history not correct on this state as well?

Normally for a job to fail at least one task fails (ignoring the cases where we fail during
job init or job commit).  Can someone explain the sequence of events that allows the job to
be marked failed due to task failure but no tasks are in the FAILED state?  Normally a job
will fail because a task reported failure, and at that point that task should be in the FAILED
state.  Is there an AM log or some other evidence that shows the sequence of state transitions
that leads to this problem?

> Job are marking as FAILED even if there are no failed tasks in it
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-4788
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4788
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 2.6.0
>            Reporter: Devaraj K
>         Attachments: MAPREDUCE-4788.patch
>
>
> Sometimes Jobs are marking as FAILED and some the tasks are marking as KILLED in it.

> In MRAppMaster, JobFinishEvent is triggering and waiting for the 5000 millis. If any
tasks final state is unknown by this time those tasks are marking as KILLED and Job state
is marking as FAILED.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message