hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-10918) ORC fails to read table with a 38Gb ORC file
Date Wed, 03 Jun 2015 23:30:38 GMT
Gopal V created HIVE-10918:
------------------------------

             Summary: ORC fails to read table with a 38Gb ORC file
                 Key: HIVE-10918
                 URL: https://issues.apache.org/jira/browse/HIVE-10918
             Project: Hive
          Issue Type: Bug
          Components: File Formats
    Affects Versions: 1.3.0
            Reporter: Gopal V


{code}

hive>  set mapreduce.input.fileinputformat.split.maxsize=1000000000000;
hive> set  mapreduce.input.fileinputformat.split.maxsize=1000000000000;
hive> alter table lineitem concatenate;
..
hive> dfs -ls /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem;
Found 12 items
-rwxr-xr-x   3 gopal supergroup 41368976599 2015-06-03 15:49 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000000_0
-rwxr-xr-x   3 gopal supergroup 36226719673 2015-06-03 15:48 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000001_0
-rwxr-xr-x   3 gopal supergroup 27544042018 2015-06-03 15:50 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000002_0
-rwxr-xr-x   3 gopal supergroup 23147063608 2015-06-03 15:44 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000003_0
-rwxr-xr-x   3 gopal supergroup 21079035936 2015-06-03 15:44 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000004_0
-rwxr-xr-x   3 gopal supergroup 13813961419 2015-06-03 15:43 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000005_0
-rwxr-xr-x   3 gopal supergroup  8155299977 2015-06-03 15:40 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000006_0
-rwxr-xr-x   3 gopal supergroup  6264478613 2015-06-03 15:40 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000007_0
-rwxr-xr-x   3 gopal supergroup  4653393054 2015-06-03 15:40 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000008_0
-rwxr-xr-x   3 gopal supergroup  3621672928 2015-06-03 15:39 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000009_0
-rwxr-xr-x   3 gopal supergroup  1460919310 2015-06-03 15:38 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000010_0
-rwxr-xr-x   3 gopal supergroup   485129789 2015-06-03 15:38 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000011_0
{code}

Errors without PPD

Suspicions about ORC stripe padding and stream offsets in the stream information, when concatenating.

{code}
Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream Stream
for column 1 kind DATA position: 1608840 length: 1608840 range: 0 offset: 1608840 limit: 1608840
range 0 = 0 to 1608840 uncompressed: 36845 to 36845
        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:56)
        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:302)
        at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:346)
        at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:582)
        at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2026)
        at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1070)
        ... 25 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message