Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A97C1200C38 for ; Wed, 1 Mar 2017 03:28:53 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A838C160B80; Wed, 1 Mar 2017 02:28:53 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 00551160B7C for ; Wed, 1 Mar 2017 03:28:52 +0100 (CET) Received: (qmail 89604 invoked by uid 500); 1 Mar 2017 02:28:52 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 89583 invoked by uid 99); 1 Mar 2017 02:28:52 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Mar 2017 02:28:52 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9D3C3C094B for ; Wed, 1 Mar 2017 02:28:51 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.347 X-Spam-Level: X-Spam-Status: No, score=-2.347 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-2.999, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 7g0fqbTlM1vX for ; Wed, 1 Mar 2017 02:28:50 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 7EC485F645 for ; Wed, 1 Mar 2017 02:28:50 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 33D7EE08C3 for ; Wed, 1 Mar 2017 02:28:47 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 77F3724172 for ; Wed, 1 Mar 2017 02:28:46 +0000 (UTC) Date: Wed, 1 Mar 2017 02:28:46 +0000 (UTC) From: "Jian He (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 01 Mar 2017 02:28:53 -0000 [ https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15889376#comment-15889376 ] Jian He commented on MAPREDUCE-6852: ------------------------------------ lgtm, committing tomorrow > Job#updateStatus() failed with NPE due to race condition > -------------------------------------------------------- > > Key: MAPREDUCE-6852 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Junping Du > Assignee: Junping Du > Attachments: MAPREDUCE-6852.patch, MAPREDUCE-6852-v2.patch > > > Like MAPREDUCE-6762, we found this issue in a cluster where Pig query occasionally failed on NPE - "Pig uses JobControl API to track MR job status, but sometimes Job History Server failed to flush job meta files to HDFS which caused the status update failed." Beside NPE in o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the exception is as following: > {noformat} > Caused by: java.lang.NullPointerException > at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323) > at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833) > at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320) > at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604) > {noformat} > We found state here is null. However, we already check the job state to be RUNNING as code below: > {noformat} > public boolean isComplete() throws IOException { > ensureState(JobState.RUNNING); > updateStatus(); > return status.isJobComplete(); > } > {noformat} > The only possible reason here is two threads are calling here for the same time: ensure state first, then one thread update the state to null while the other thread hit NPE issue here. > We should fix this NPE exception. -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org