hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth Jayachandran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-12712) HiveInputFormat may fail to column names to read in some cases
Date Mon, 21 Dec 2015 23:58:46 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15067285#comment-15067285
] 

Prasanth Jayachandran commented on HIVE-12712:
----------------------------------------------

The test failures are unrelated. 18 tests are failing for other patches as well. Dynamic partition
pruning test case is related to JDK version. On JDK v7 the test passes and on JDK v8 it fails
with hashmap ordering difference which is a known issue.

> HiveInputFormat may fail to column names to read in some cases
> --------------------------------------------------------------
>
>                 Key: HIVE-12712
>                 URL: https://issues.apache.org/jira/browse/HIVE-12712
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Takahiko Saito
>            Assignee: Prasanth Jayachandran
>         Attachments: HIVE-12712.1.patch, HIVE-12712.2.patch
>
>
> The primary issue is when plan is generated pathToAliases map is populated with directory
paths to table aliases. pathToAliases.put() uses path.toString() as map key. During probing,
path.toUri().toString() is used. This can cause probe misses when path contains spaces in
them. path.toUri() will escape the spaces in the path whereas path.toString() does not escape
the spaces. As a result, HiveInputFormat can trigger a different code path which can fail
to set list of columns to read from the source table. This was causing unexpected NPE in OrcInputFormat
(after refactoring HIVE-11705) which removed null check for column names. The resulting exception
is 
> {code}
> Caused by: java.lang.RuntimeException: ORC split generation failed with exception: java.lang.NullPointerException
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1288)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1354)
>         at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:367)
>         at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:457)
>         at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:152)
>         at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:246)
>         at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:240)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:240)
>         at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:227)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         ... 3 more
> Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1282)
>         ... 15 more
> Caused by: java.lang.NullPointerException
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.extractNeededColNames(OrcInputFormat.java:422)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.extractNeededColNames(OrcInputFormat.java:417)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$2000(OrcInputFormat.java:134)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1072)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:919)
>         ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message