hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4428) A failed job is not available under job history if the job is killed right around the time job is notified as failed
Date Wed, 11 Jul 2012 18:55:36 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13411853#comment-13411853
] 

Robert Joseph Evans commented on MAPREDUCE-4428:
------------------------------------------------

This can be caused by a number of things.  Most likely it is caused by a bug in the AM that
is making it crash before it can move the job history log to where the history server will
pick it up.  If that is the case you need to get access to the AM logs so that we can look
at what is happening there.  I am not familiar with what CDH4 has in the UI, but if you click
on the application id in the main RM web page you should see a link to the AM's logs.
                
> A failed job is not available under job history if the job is killed right around the
time job is notified as failed 
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4428
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4428
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver, jobtracker
>    Affects Versions: 2.0.0-alpha
>            Reporter: Rahul Jain
>
> We have observed this issue consistently running hadoop CDH4 version (based upon 2.0
alpha release):
> In case our hadoop client code gets a notification for a completed job ( using RunningJob
object job, with (job.isComplete() && job.isSuccessful()==false)
> the hadoop client code does an unconditional job.killJob() to terminate the job.
> With earlier hadoop versions (verified on hadoop 0.20.2 version), we still  have full
access to job logs afterwards through hadoop console. However, when using MapReduceV2, the
failed hadoop job no longer shows up under jobhistory server. Also, the tracking URL of the
job still points to the non-existent Application master http port.
> Once we removed the call to job.killJob() for failed jobs from our hadoop client code,
we were able to access the job in job history with mapreduce V2 as well. Therefore this appears
to be a race condition in the job management wrt. job history for failed jobs.
> We do have the application master and node manager logs collected for this scenario if
that'll help isolate the problem and the fix better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message