hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-4691) Historyserver can report "Unknown job" after RM says job has completed
Date Thu, 27 Sep 2012 23:13:07 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jason Lowe updated MAPREDUCE-4691:

    Summary: Historyserver can report "Unknown job" after RM says job has completed  (was:
Historyserver can report "Unknown job" after RM says job has completed.)

There is a race condition in the historyserver where two threads can be trying to scan the
same user's done intermediate directory for two separate jobs.  One thread will win the race
and update the user timestamp in {{HistoryFileManager.scanIntermediateDirectory}} *before*
it has actually completed the scan.  The second thread will then see the timestamp has been
updated, think there's no point in doing a scan, and return with no job found.
> Historyserver can report "Unknown job" after RM says job has completed
> ----------------------------------------------------------------------
>                 Key: MAPREDUCE-4691
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4691
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver, mrv2
>    Affects Versions: 0.23.3, 2.0.1-alpha
>            Reporter: Jason Lowe
>            Priority: Critical
> Example traceback from the client:
> {noformat}
> 2012-09-27 20:28:38,068 [main] INFO  org.apache.hadoop.mapred.ClientServiceDelegate -
Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history
> 2012-09-27 20:28:38,530 [main] WARN  org.apache.hadoop.mapred.ClientServiceDelegate -
Error from remote end: Unknown job job_1348097917603_3019
> 2012-09-27 20:28:38,530 [main] ERROR org.apache.hadoop.security.UserGroupInformation
- PriviledgedActionException as:xxx (auth:KERBEROS) cause:org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl:
Unknown job job_1348097917603_3019
> 2012-09-27 20:28:38,531 [main] WARN  org.apache.pig.tools.pigstats.JobStats - Failed
to get map task report
> RemoteTrace: 
>  at LocalTrace: 
>         org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Unknown
job job_1348097917603_3019
>         at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:156)
>         at $Proxy11.getJobReport(Unknown Source)
>         at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:116)
>         at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:298)
>         at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:383)
>         at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:482)
>         at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:184)
> ...
> {noformat}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message