Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E0C54DA64 for ; Thu, 27 Sep 2012 23:13:07 +0000 (UTC) Received: (qmail 82881 invoked by uid 500); 27 Sep 2012 23:13:07 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 82832 invoked by uid 500); 27 Sep 2012 23:13:07 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 82823 invoked by uid 99); 27 Sep 2012 23:13:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Sep 2012 23:13:07 +0000 Date: Fri, 28 Sep 2012 10:13:07 +1100 (NCT) From: "Jason Lowe (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <654666087.136773.1348787587653.JavaMail.jiratomcat@arcas> In-Reply-To: <1257243515.136611.1348786027829.JavaMail.jiratomcat@arcas> Subject: [jira] [Updated] (MAPREDUCE-4691) Historyserver can report "Unknown job" after RM says job has completed MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated MAPREDUCE-4691: ---------------------------------- Summary: Historyserver can report "Unknown job" after RM says job has completed (was: Historyserver can report "Unknown job" after RM says job has completed.) There is a race condition in the historyserver where two threads can be trying to scan the same user's done intermediate directory for two separate jobs. One thread will win the race and update the user timestamp in {{HistoryFileManager.scanIntermediateDirectory}} *before* it has actually completed the scan. The second thread will then see the timestamp has been updated, think there's no point in doing a scan, and return with no job found. > Historyserver can report "Unknown job" after RM says job has completed > ---------------------------------------------------------------------- > > Key: MAPREDUCE-4691 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4691 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: jobhistoryserver, mrv2 > Affects Versions: 0.23.3, 2.0.1-alpha > Reporter: Jason Lowe > Priority: Critical > > Example traceback from the client: > {noformat} > 2012-09-27 20:28:38,068 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server > 2012-09-27 20:28:38,530 [main] WARN org.apache.hadoop.mapred.ClientServiceDelegate - Error from remote end: Unknown job job_1348097917603_3019 > 2012-09-27 20:28:38,530 [main] ERROR org.apache.hadoop.security.UserGroupInformation - PriviledgedActionException as:xxx (auth:KERBEROS) cause:org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Unknown job job_1348097917603_3019 > 2012-09-27 20:28:38,531 [main] WARN org.apache.pig.tools.pigstats.JobStats - Failed to get map task report > RemoteTrace: > at LocalTrace: > org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: Unknown job job_1348097917603_3019 > at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:156) > at $Proxy11.getJobReport(Unknown Source) > at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getJobReport(MRClientProtocolPBClientImpl.java:116) > at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:298) > at org.apache.hadoop.mapred.ClientServiceDelegate.getJobStatus(ClientServiceDelegate.java:383) > at org.apache.hadoop.mapred.YARNRunner.getJobStatus(YARNRunner.java:482) > at org.apache.hadoop.mapreduce.Cluster.getJob(Cluster.java:184) > ... > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira