hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prasanth Jayachandran (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-12712) HiveInputFormat may fail to column names to read in some cases
Date Sat, 19 Dec 2015 00:35:46 GMT

     [ https://issues.apache.org/jira/browse/HIVE-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prasanth Jayachandran updated HIVE-12712:
-----------------------------------------
    Reporter: Takahiko Saito  (was: Prasanth Jayachandran)

> HiveInputFormat may fail to column names to read in some cases
> --------------------------------------------------------------
>
>                 Key: HIVE-12712
>                 URL: https://issues.apache.org/jira/browse/HIVE-12712
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: Takahiko Saito
>            Assignee: Prasanth Jayachandran
>
> The primary issue is when plan is generated pathToAliases map is populated with directory
paths to table aliases. pathToAliases.put() uses path.toString() as map key. During probing,
path.toUri().toString() is used. This can cause probe misses when path contains spaces in
them. path.toUri() will escape the spaces in the path whereas path.toString() does not escape
the spaces. As a result, HiveInputFormat can trigger a different code path which can fail
to set list of columns to read from the source table. This was causing unexpected NPE in OrcInputFormat
(after refactoring HIVE-11705) which removed null check for column names. The resulting exception
is 
> {code}
> Caused by: java.lang.RuntimeException: ORC split generation failed with exception: java.lang.NullPointerException
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1288)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1354)
>         at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:367)
>         at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:457)
>         at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:152)
>         at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:246)
>         at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:240)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>         at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:240)
>         at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:227)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         ... 3 more
> Caused by: java.util.concurrent.ExecutionException: java.lang.NullPointerException
>         at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>         at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1282)
>         ... 15 more
> Caused by: java.lang.NullPointerException
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.extractNeededColNames(OrcInputFormat.java:422)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.extractNeededColNames(OrcInputFormat.java:417)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$2000(OrcInputFormat.java:134)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1072)
>         at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:919)
>         ... 4 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message