Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4BCE79746 for ; Wed, 11 Apr 2012 09:10:55 +0000 (UTC) Received: (qmail 80505 invoked by uid 500); 11 Apr 2012 09:10:55 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 80199 invoked by uid 500); 11 Apr 2012 09:10:50 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 80158 invoked by uid 99); 11 Apr 2012 09:10:49 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Apr 2012 09:10:49 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Apr 2012 09:10:45 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id A02AE3653CC for ; Wed, 11 Apr 2012 09:10:24 +0000 (UTC) Date: Wed, 11 Apr 2012 09:10:24 +0000 (UTC) From: "Alejandro Abdelnur (Resolved) (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <544209837.11556.1334135424673.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <142713918.16330.1333617813201.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Resolved] (MAPREDUCE-4109) availability of a job info in HS should be atomic MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alejandro Abdelnur resolved MAPREDUCE-4109. ------------------------------------------- Resolution: Invalid after looking at the code my assumptions proven incorrect, it is not possible for such scenario. What may be happening is MAPREDUCE-3972. > availability of a job info in HS should be atomic > ------------------------------------------------- > > Key: MAPREDUCE-4109 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4109 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster, jobhistoryserver, mrv2 > Affects Versions: 2.0.0 > Reporter: Alejandro Abdelnur > Priority: Blocker > Fix For: 2.0.0 > > > It seems that the HS starts serving info about a job before it has all the info available. > In the trace below, a RunningJob throws a NPE when trying to access the counters. > This is happening on & off, thus I assume it is related to either the AM not flushing all job info to HDFS before notifying HS or the HS not loading all the job info from HDFS before start serving it. > In case it helps to diagnose the issue, this is happening in a secure cluster. > This makes Oozie to mark jobs as failed. > {code} > java.lang.NullPointerException > at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$MRClientProtocolHandler.getCounters(HistoryClientService.java:214) > at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getCounters(MRClientProtocolPBServiceImpl.java:149) > at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:206) > at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(ProtoOverHadoopRpcEngine.java:355) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1660) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1656) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1654) > at LocalTrace: > org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: > at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:163) > at $Proxy31.getCounters(Unknown Source) > at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getCounters(MRClientProtocolPBClientImpl.java:162) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:616) > at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:296) > at org.apache.hadoop.mapred.ClientServiceDelegate.getJobCounters(ClientServiceDelegate.java:325) > at org.apache.hadoop.mapred.YARNRunner.getJobCounters(YARNRunner.java:472) > at org.apache.hadoop.mapreduce.Job$8.run(Job.java:714) > at org.apache.hadoop.mapreduce.Job$8.run(Job.java:711) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:416) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177) > at org.apache.hadoop.mapreduce.Job.getCounters(Job.java:711) > at org.apache.hadoop.mapred.JobClient$NetworkedJob.getCounters(JobClient.java:396) > at org.apache.oozie.action.hadoop.LauncherMapper.hasIdSwap(LauncherMapper.java:296) > at org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:886) > at org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:162) > at org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:51) > at org.apache.oozie.command.XCommand.call(XCommand.java:260) > at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:166) > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:679) > {code} > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira