hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6852) Job#updateStatus() failed with NPE due to race condition
Date Thu, 02 Mar 2017 19:10:45 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892808#comment-15892808
] 

Hudson commented on MAPREDUCE-6852:
-----------------------------------

FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #11331 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/11331/])
MAPREDUCE-6852. Job#updateStatus() failed with NPE due to race (jianhe: rev 747bafaf969857b66233a8b4660590bdd712ed7d)
* (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/Job.java


> Job#updateStatus() failed with NPE due to race condition
> --------------------------------------------------------
>
>                 Key: MAPREDUCE-6852
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6852
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Junping Du
>            Assignee: Junping Du
>             Fix For: 2.9.0
>
>         Attachments: MAPREDUCE-6852.patch, MAPREDUCE-6852-v2.patch
>
>
> Like MAPREDUCE-6762, we found this issue in a cluster where Pig query occasionally failed
on NPE - "Pig uses JobControl API to track MR job status, but sometimes Job History Server
failed to flush job meta files to HDFS which caused the status update failed." Beside NPE
in o.a.h.mapreduce.Job.getJobName, we also get NPE in Job.updateStatus() and the exception
is as following:
> {noformat}
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.mapreduce.Job$1.run(Job.java:323)
> 	at org.apache.hadoop.mapreduce.Job$1.run(Job.java:320)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1833)
> 	at org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
> 	at org.apache.hadoop.mapreduce.Job.isComplete(Job.java:604)
> {noformat}
> We found state here is null. However, we already check the job state to be RUNNING as
code below:
> {noformat}
>   public boolean isComplete() throws IOException {
>     ensureState(JobState.RUNNING);
>     updateStatus();
>     return status.isJobComplete();
>   }
> {noformat}
> The only possible reason here is two threads are calling here for the same time: ensure
state first, then one thread update the state to null while the other thread hit NPE issue
here.
> We should fix this NPE exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message