hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesus Camacho Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16480) ORC file with empty array<double> and array<float> fails to read
Date Fri, 07 Sep 2018 00:18:00 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jesus Camacho Rodriguez updated HIVE-16480:
-------------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.2.1
                   2.1.2
           Status: Resolved  (was: Patch Available)

bq. This patch applies to branch-2.1 and branch-2.2. In branch-2.3 and above Hive uses the
ORC project artifacts, so we'll need to release from ORC. Once the patch goes in, we should
start that process.

Patch has been pushed to branch-2.1 and branch-2.2. For consuming new ORC release from branch-2.3
and fix that issue in 2.3.x, a new issue can be created. Closing this issue. Thanks [~owen.omalley]

> ORC file with empty array<double> and array<float> fails to read
> ----------------------------------------------------------------
>
>                 Key: HIVE-16480
>                 URL: https://issues.apache.org/jira/browse/HIVE-16480
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.1.1, 2.2.0
>            Reporter: David Capwell
>            Assignee: Owen O'Malley
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.1.2, 2.2.1
>
>
> We have a schema that has a array<double> in it.  We were unable to read this file
and digging into ORC it seems that the issue is when the array is empty.
> Here is the stack trace
> {code:title=EmptyList.log|borderStyle=solid}
> ERROR 2017-04-19 09:29:17,075 [main] [EmptyList] [line 56] Failed to work with type float

> java.io.IOException: Error reading file: /var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-float.orc
>   at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1052) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:135)
~[hive-exec-2.1.1.jar:2.1.1]
>   at EmptyList.emptyList(EmptyList.java:49) ~[test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_121]
>   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_121]
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
[junit-4.12.jar:4.12]
>   at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
[junit-4.12.jar:4.12]
>   at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
[junit-4.12.jar:4.12]
>   at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
[junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) [junit-4.12.jar:4.12]
>   at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
[junit-4.12.jar:4.12]
>   at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
[junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363) [junit-4.12.jar:4.12]
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12]
>   at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
[junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
[junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
[junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) [junit-rt.jar:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_121]
>   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_121]
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) [idea_rt.jar:na]
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column
1 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
>   at org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:118) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.SerializationUtils.readFloat(SerializationUtils.java:78) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.TreeReaderFactory$FloatTreeReader.nextVector(TreeReaderFactory.java:619)
~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.TreeReaderFactory$TreeReader.nextBatch(TreeReaderFactory.java:154)
~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) ~[hive-orc-2.1.1.jar:2.1.1]
>   ... 29 common frames omitted
>  INFO 2017-04-19 09:29:17,091 [main] [WriterImpl] [line 205] ORC writer created for path:
/var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-double.orc with stripeSize:
67108864 blockSize: 268435456 compression: ZLIB bufferSize: 262144 
>  INFO 2017-04-19 09:29:17,100 [main] [ReaderImpl] [line 357] Reading ORC rows from /var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-double.orc
with {include: null, offset: 0, length: 9223372036854775807} 
>  INFO 2017-04-19 09:29:17,101 [main] [RecordReaderImpl] [line 142] Schema on read not
provided -- using file schema array<double> 
> ERROR 2017-04-19 09:29:17,104 [main] [EmptyList] [line 56] Failed to work with type double

> java.io.IOException: Error reading file: /var/folders/t8/t5x1031d7mn17f6xpwnkkv_40000gn/T/1492619355819-0/file-double.orc
>   at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1052) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:135)
~[hive-exec-2.1.1.jar:2.1.1]
>   at EmptyList.emptyList(EmptyList.java:49) ~[test-classes/:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_121]
>   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_121]
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
[junit-4.12.jar:4.12]
>   at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
[junit-4.12.jar:4.12]
>   at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
[junit-4.12.jar:4.12]
>   at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
[junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325) [junit-4.12.jar:4.12]
>   at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
[junit-4.12.jar:4.12]
>   at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
[junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268) [junit-4.12.jar:4.12]
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363) [junit-4.12.jar:4.12]
>   at org.junit.runner.JUnitCore.run(JUnitCore.java:137) [junit-4.12.jar:4.12]
>   at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
[junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
[junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
[junit-rt.jar:na]
>   at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70) [junit-rt.jar:na]
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_121]
>   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_121]
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[na:1.8.0_121]
>   at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_121]
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147) [idea_rt.jar:na]
> Caused by: java.io.EOFException: Read past EOF for compressed stream Stream for column
1 kind DATA position: 0 length: 0 range: 0 offset: 0 limit: 0
>   at org.apache.orc.impl.SerializationUtils.readFully(SerializationUtils.java:118) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.SerializationUtils.readLongLE(SerializationUtils.java:101) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.SerializationUtils.readDouble(SerializationUtils.java:97) ~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.TreeReaderFactory$DoubleTreeReader.nextVector(TreeReaderFactory.java:713)
~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.TreeReaderFactory$TreeReader.nextBatch(TreeReaderFactory.java:154)
~[hive-orc-2.1.1.jar:2.1.1]
>   at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045) ~[hive-orc-2.1.1.jar:2.1.1]
>   ... 29 common frames omitted
> {code}
> If you create a ORC file with one row as the following
> {code}
> orc.addRow(Lists.newArrayList());
> {code}
> then try to read it
> {code}
> VectorizedRowBatch batch = reader.getSchema().createRowBatch();
> while (rows.nextBatch(batch)) { }
> {code}
> You will produce the above stack trace.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message