hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-14303) CommonJoinOperator.checkAndGenObject should return directly in CLOSE state to avoid NPE if ExecReducer.close is called twice.
Date Thu, 21 Jul 2016 03:17:20 GMT
zhihai xu created HIVE-14303:
--------------------------------

             Summary: CommonJoinOperator.checkAndGenObject should return directly in CLOSE
state to avoid NPE if ExecReducer.close is called twice.
                 Key: HIVE-14303
                 URL: https://issues.apache.org/jira/browse/HIVE-14303
             Project: Hive
          Issue Type: Bug
            Reporter: zhihai xu
            Assignee: zhihai xu
             Fix For: 2.1.0


CommonJoinOperator.checkAndGenObject should return directly in CLOSE state to avoid NPE if
ExecReducer.close is called twice. ExecReducer.close implements Closeable interface and ExecReducer.close
can be called multiple time. We saw the following NPE which hide the real exception due to
this bug.
{code}
Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: null

        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)

        at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)

        at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)

        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)

        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:415)

        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)

        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

Caused by: java.lang.NullPointerException

        at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718)

        at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)

        at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)

        ... 8 more
{code}
The code from ReduceTask.runOldReducer:
{code}
      reducer.close(); //line 453
      reducer = null;
      
      out.close(reporter);
      out = null;
    } finally {
      IOUtils.cleanup(LOG, reducer);// line 459
      closeQuietly(out, reporter);
    }
{code}
Based on the above stack trace and code, reducer.close() is called twice because the exception
happened when reducer.close() is called for the first time at line 453, the code exit before
reducer was set to null. NullPointerException is triggered when reducer.close() is called
for the second time in IOUtils.cleanup. NullPointerException hide the real exception which
happened when reducer.close() is called for the first time at line 453.
The reason for NPE is:
The first reducer.close called CommonJoinOperator.closeOp which clear {{storage}}
{code}
Arrays.fill(storage, null);
{code}
the second reduce.close generated NPE due to null {{storage[alias]}} which is set to null
by first reducer.close.
The following reducer log can give more proof:
{code}
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished.
closing... 
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished.
closing... 
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished.
closing... 
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: 2 finished.
closing... 
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: 3 finished.
closing... 
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: 4 finished.
closing... 
2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator: FS[4]:
records written - 53466
2016-07-14 22:25:11,555 ERROR [main] ExecReducer: Hit error while closing operators - failing
tree
2016-07-14 22:25:11,649 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running
child : java.lang.RuntimeException: Hive Runtime Error while closing operators: null
	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
	at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718)
	at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)
	... 8 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message