hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhihai xu (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-14303) CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to avoid NPE if ExecReducer.close is called twice.
Date Thu, 28 Jul 2016 03:26:20 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396851#comment-15396851
] 

zhihai xu edited comment on HIVE-14303 at 7/28/16 3:26 AM:
-----------------------------------------------------------

Thanks for the review [~xuefuz]! Currently {{IOUtils.cleanup}} already checks if reducer is
null, but the reducer is not null, because the exception happened inside {{reducer.close()}}
at line 453 and it will exit before the reducer is set to null at line 454. Also {{ExecReducer}}
didn't follow {{Closeable}} interface rule: https://docs.oracle.com/javase/7/docs/api/java/io/Closeable.html#close()

{code}
  public static void cleanup(Log log, java.io.Closeable... closeables) {
    for (java.io.Closeable c : closeables) {
      if (c != null) {// check if reducer is null here
        try {
          c.close();
        } catch(IOException e) {
          if (log != null && log.isDebugEnabled()) {
            log.debug("Exception in closing " + c, e);
          }
        }
      }
    }
  }
{code}


was (Author: zxu):
Thanks for the review [~xuefuz]! Currently {{IOUtils.cleanup}} already checks if reducer is
null, but the reducer is not null, because the exception happened inside {{reducer.close()}}
at line 453 and it will exit before the reducer is set to null at line 454. Also {{ExecReducer}}
didn't follow {{Closeable}} interface rule:https://docs.oracle.com/javase/7/docs/api/java/io/Closeable.html#close()

{code}
  public static void cleanup(Log log, java.io.Closeable... closeables) {
    for (java.io.Closeable c : closeables) {
      if (c != null) {// check if reducer is null here
        try {
          c.close();
        } catch(IOException e) {
          if (log != null && log.isDebugEnabled()) {
            log.debug("Exception in closing " + c, e);
          }
        }
      }
    }
  }
{code}

> CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to avoid NPE
if ExecReducer.close is called twice.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-14303
>                 URL: https://issues.apache.org/jira/browse/HIVE-14303
>             Project: Hive
>          Issue Type: Bug
>            Reporter: zhihai xu
>            Assignee: zhihai xu
>             Fix For: 2.1.0
>
>         Attachments: HIVE-14303.0.patch
>
>
> CommonJoinOperator.checkAndGenObject should return directly at CLOSE state to avoid NPE
if ExecReducer.close is called twice. ExecReducer.close implements Closeable interface and
ExecReducer.close can be called multiple time. We saw the following NPE which hide the real
exception due to this bug.
> {code}
> Error: java.lang.RuntimeException: Hive Runtime Error while closing operators: null
>         at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
>         at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
>         at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
>         at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718)
>         at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
>         at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)
>         ... 8 more
> {code}
> The code from ReduceTask.runOldReducer:
> {code}
>       reducer.close(); //line 453
>       reducer = null;
>       
>       out.close(reporter);
>       out = null;
>     } finally {
>       IOUtils.cleanup(LOG, reducer);// line 459
>       closeQuietly(out, reporter);
>     }
> {code}
> Based on the above stack trace and code, reducer.close() is called twice because the
exception happened when reducer.close() is called for the first time at line 453, the code
exit before reducer was set to null. NullPointerException is triggered when reducer.close()
is called for the second time in IOUtils.cleanup at line 459. NullPointerException hide the
real exception which happened when reducer.close() is called for the first time at line 453.
> The reason for NPE is:
> The first reducer.close called CommonJoinOperator.closeOp which clear {{storage}}
> {code}
> Arrays.fill(storage, null);
> {code}
> the second reduce.close generated NPE due to null {{storage[alias]}} which is set to
null by first reducer.close.
> The following reducer log can give more proof:
> {code}
> 2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished.
closing... 
> 2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.JoinOperator: 0 finished.
closing... 
> 2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.JoinOperator: SKEWJOINFOLLOWUPJOBS:0
> 2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: 1
finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: 2
finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: 3
finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator:
4 finished. closing... 
> 2016-07-14 22:24:51,016 INFO [main] org.apache.hadoop.hive.ql.exec.FileSinkOperator:
FS[4]: records written - 53466
> 2016-07-14 22:25:11,555 ERROR [main] ExecReducer: Hit error while closing operators -
failing tree
> 2016-07-14 22:25:11,649 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running
child : java.lang.RuntimeException: Hive Runtime Error while closing operators: null
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:296)
> 	at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
> 	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:459)
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: java.lang.NullPointerException
> 	at org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:718)
> 	at org.apache.hadoop.hive.ql.exec.JoinOperator.endGroup(JoinOperator.java:256)
> 	at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.close(ExecReducer.java:284)
> 	... 8 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message