hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HIVE-10918) ORC fails to read table with a 38Gb ORC file
Date Wed, 03 Jun 2015 23:53:38 GMT

     [ https://issues.apache.org/jira/browse/HIVE-10918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gopal V resolved HIVE-10918.
----------------------------
    Resolution: Duplicate

> ORC fails to read table with a 38Gb ORC file
> --------------------------------------------
>
>                 Key: HIVE-10918
>                 URL: https://issues.apache.org/jira/browse/HIVE-10918
>             Project: Hive
>          Issue Type: Bug
>          Components: File Formats
>    Affects Versions: 1.3.0
>            Reporter: Gopal V
>
> {code}
> hive>  set mapreduce.input.fileinputformat.split.maxsize=1000000000000;
> hive> set  mapreduce.input.fileinputformat.split.maxsize=1000000000000;
> hive> alter table lineitem concatenate;
> ..
> hive> dfs -ls /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem;
> Found 12 items
> -rwxr-xr-x   3 gopal supergroup 41368976599 2015-06-03 15:49 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000000_0
> -rwxr-xr-x   3 gopal supergroup 36226719673 2015-06-03 15:48 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000001_0
> -rwxr-xr-x   3 gopal supergroup 27544042018 2015-06-03 15:50 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000002_0
> -rwxr-xr-x   3 gopal supergroup 23147063608 2015-06-03 15:44 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000003_0
> -rwxr-xr-x   3 gopal supergroup 21079035936 2015-06-03 15:44 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000004_0
> -rwxr-xr-x   3 gopal supergroup 13813961419 2015-06-03 15:43 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000005_0
> -rwxr-xr-x   3 gopal supergroup  8155299977 2015-06-03 15:40 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000006_0
> -rwxr-xr-x   3 gopal supergroup  6264478613 2015-06-03 15:40 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000007_0
> -rwxr-xr-x   3 gopal supergroup  4653393054 2015-06-03 15:40 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000008_0
> -rwxr-xr-x   3 gopal supergroup  3621672928 2015-06-03 15:39 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000009_0
> -rwxr-xr-x   3 gopal supergroup  1460919310 2015-06-03 15:38 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000010_0
> -rwxr-xr-x   3 gopal supergroup   485129789 2015-06-03 15:38 /apps/hive/warehouse/tpch_orc_flat_1000.db/lineitem/000011_0
> {code}
> Errors without PPD
> Suspicions about ORC stripe padding and stream offsets in the stream information, when
concatenating.
> {code}
> Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream
Stream for column 1 kind DATA position: 1608840 length: 1608840 range: 0 offset: 1608840 limit:
1608840 range 0 = 0 to 1608840 uncompressed: 36845 to 36845
>         at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:56)
>         at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:302)
>         at org.apache.hadoop.hive.ql.io.orc.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:346)
>         at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:582)
>         at org.apache.hadoop.hive.ql.io.orc.TreeReaderFactory$StructTreeReader.nextVector(TreeReaderFactory.java:2026)
>         at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1070)
>         ... 25 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message