orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shardul Mahadik (Jira)" <j...@apache.org>
Subject [jira] [Created] (ORC-555) IllegalArgumentException when reading files written with older ORC writers in ORC 1.6
Date Thu, 26 Sep 2019 01:14:00 GMT
Shardul Mahadik created ORC-555:
-----------------------------------

             Summary: IllegalArgumentException when reading files written with older ORC writers
in ORC 1.6
                 Key: ORC-555
                 URL: https://issues.apache.org/jira/browse/ORC-555
             Project: ORC
          Issue Type: Bug
            Reporter: Shardul Mahadik


I am using {{orc-core::nohive}} to read an ORC file which was generated using an older version
of ORC (probably through Hive 1.1). I am unable to read this file since ORC 1.6 and am able
to read it in 1.5.5.

Code:
{code:java}
final Reader orcReader = OrcFile.createReader(new Path("/Users/smahadik/orcFailure.orc"),
    OrcFile.readerOptions(new Configuration()));
System.out.println(orcReader.getNumberOfRows());
{code}

Stacktrace:
{code:java}
java.io.IOException: Problem reading file footer /Users/smahadik/orcFailure.orc

	at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:716)
	at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:500)
	at org.apache.orc.OrcFile.createReader(OrcFile.java:365)
	at example.testFileFooterReadFailure(TestOrcMetrics.java:16)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: java.lang.IllegalArgumentException
	at java.nio.Buffer.position(Buffer.java:244)
	at org.apache.orc.impl.InStream$CompressedStream.setCurrent(InStream.java:453)
	at org.apache.orc.impl.InStream$CompressedStream.reset(InStream.java:440)
	at org.apache.orc.impl.InStream$CompressedStream.<init>(InStream.java:426)
	at org.apache.orc.impl.InStream.create(InStream.java:843)
	at org.apache.orc.impl.ReaderImpl.extractFileTail(ReaderImpl.java:706)
	... 25 more
{code}

Unfortunately I cannot share the data file for the failure. I am not really familiar with
the ORC codebase so not sure what is actually happening here. I will try to dig more though
if I can find any more information.

Here's what I know so far. The error occurs at https://github.com/apache/orc/blob/d10142c49fa4d4bdc9d187195a34377f60d486b1/java/core/src/java/org/apache/orc/impl/InStream.java#L453
because the {{compressed}} limit is less than the position it is trying to set. It is going
through this if condition in {{ReaderImpl}} which was changed recently  https://github.com/apache/orc/blob/d10142c49fa4d4bdc9d187195a34377f60d486b1/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L691
The extra value is around 3k so the code seems to switch the original buffer of limit 16k
to new buffer of limit 3k. This smaller buffer is passed to https://github.com/apache/orc/blob/d10142c49fa4d4bdc9d187195a34377f60d486b1/java/core/src/java/org/apache/orc/impl/ReaderImpl.java#L706
and it fails eventually.

Values of some variables at line 706
size = 309950950
readSize = 16384
psLen = 26
psOffset = 309950923
tailSize = 20314
footerSize = 3650
metadataSize = 16637
extra = 3930
buffer = data range [309930636, 309934566), size: 3930 type: array-backed
buffer.next = data range [309934566, 309950950), size: 16384 type: array-backed
stripeStatSize = 0

Does anyone have any insights/intuition about what might be happening and how we can debug
this? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message