drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chun Chang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-4767) Parquet reader throw IllegalArgumentException for int32 type with GZIP compression
Date Wed, 17 Aug 2016 01:27:21 GMT

     [ https://issues.apache.org/jira/browse/DRILL-4767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chun Chang updated DRILL-4767:
------------------------------
    Priority: Blocker  (was: Major)

> Parquet reader throw IllegalArgumentException for int32 type with GZIP compression
> ----------------------------------------------------------------------------------
>
>                 Key: DRILL-4767
>                 URL: https://issues.apache.org/jira/browse/DRILL-4767
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.7.0
>            Reporter: Chun Chang
>            Priority: Blocker
>             Fix For: 1.8.0
>
>         Attachments: int32_10_bs10k_ps1k_gzip.parquet
>
>
> Created a small parquet file with the following schema:
> {noformat}
> [root@perfnode166 parquet-mr]# java -jar parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar
schema /mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
> message test {
>   required int32 int32_field_required;
>   optional int32 int32_field_optional;
>   repeated int32 int32_field_repeated;
> }
> {noformat}
> and meta
> {noformat}
> [root@perfnode166 parquet-mr]# java -jar parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar
meta /mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
> file:                 file:/mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
> creator:              parquet-mr version 1.8.2-SNAPSHOT (build 0cfa025d6ffeee07cb0fa2125c977185b849e5c9)
> extra:                writer.model.name = example
> file schema:          test
> --------------------------------------------------------------------------------
> int32_field_required: REQUIRED INT32 R:0 D:0
> int32_field_optional: OPTIONAL INT32 R:0 D:1
> int32_field_repeated: REPEATED INT32 R:1 D:1
> row group 1:          RC:10 TS:147 OFFSET:4
> --------------------------------------------------------------------------------
> int32_field_required:  INT32 GZIP DO:0 FPO:4 SZ:65/47/0.72 VC:10 ENC:DELTA_BINARY_PACKED
> int32_field_optional:  INT32 GZIP DO:0 FPO:69 SZ:67/49/0.73 VC:10 ENC:DELTA_BINARY_PACKED
> int32_field_repeated:  INT32 GZIP DO:0 FPO:136 SZ:69/51/0.74 VC:10 ENC:DELTA_BINARY_PACKED
> {noformat}
> and dump
> {noformat}
> [root@perfnode166 parquet-mr]# java -jar parquet-tools/target/parquet-tools-1.8.2-SNAPSHOT.jar
dump /mapr/drill50.perf.lab/drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet
> row group 0
> --------------------------------------------------------------------------------
> int32_field_required:  INT32 GZIP DO:0 FPO:4 SZ:65/47/0.72 VC:10 ENC:D [more]...
> int32_field_optional:  INT32 GZIP DO:0 FPO:69 SZ:67/49/0.73 VC:10 ENC: [more]...
> int32_field_repeated:  INT32 GZIP DO:0 FPO:136 SZ:69/51/0.74 VC:10 ENC [more]...
>     int32_field_required TV=10 RL=0 DL=0
>     ----------------------------------------------------------------------------
>     page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 0, max:  [more]... VC:10
>     int32_field_optional TV=10 RL=0 DL=1
>     ----------------------------------------------------------------------------
>     page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 1, max:  [more]... VC:10
>     int32_field_repeated TV=10 RL=1 DL=1
>     ----------------------------------------------------------------------------
>     page 0:  DLE:RLE RLE:RLE VLE:DELTA_BINARY_PACKED ST:[min: 2, max:  [more]... VC:10
> INT32 int32_field_required
> --------------------------------------------------------------------------------
> *** row group 1 of 1, values 1 to 10 ***
> value 1:  R:0 D:0 V:0
> value 2:  R:0 D:0 V:3
> value 3:  R:0 D:0 V:6
> value 4:  R:0 D:0 V:9
> value 5:  R:0 D:0 V:12
> value 6:  R:0 D:0 V:15
> value 7:  R:0 D:0 V:18
> value 8:  R:0 D:0 V:21
> value 9:  R:0 D:0 V:24
> value 10: R:0 D:0 V:27
> INT32 int32_field_optional
> --------------------------------------------------------------------------------
> *** row group 1 of 1, values 1 to 10 ***
> value 1:  R:0 D:1 V:1
> value 2:  R:0 D:1 V:4
> value 3:  R:0 D:1 V:7
> value 4:  R:0 D:1 V:10
> value 5:  R:0 D:1 V:13
> value 6:  R:0 D:1 V:16
> value 7:  R:0 D:1 V:19
> value 8:  R:0 D:1 V:22
> value 9:  R:0 D:1 V:25
> value 10: R:0 D:1 V:28
> INT32 int32_field_repeated
> --------------------------------------------------------------------------------
> *** row group 1 of 1, values 1 to 10 ***
> value 1:  R:0 D:1 V:2
> value 2:  R:0 D:1 V:5
> value 3:  R:0 D:1 V:8
> value 4:  R:0 D:1 V:11
> value 5:  R:0 D:1 V:14
> value 6:  R:0 D:1 V:17
> value 7:  R:0 D:1 V:20
> value 8:  R:0 D:1 V:23
> value 9:  R:0 D:1 V:26
> value 10: R:0 D:1 V:29
> {noformat}
> But query through drill, I got the following error:
> {noformat}
> 0: jdbc:drill:schema=dfs.drillTestDir> select * from dfs.`drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet`;
> Error: SYSTEM ERROR: IllegalArgumentException
> Fragment 0:0
> [Error Id: d91ec9fe-0ce3-4d05-9e5b-d53cebb99726 on 10.10.30.169:31010] (state=,code=0)
> 0: jdbc:drill:schema=dfs.drillTestDir> select * from sys.version;
> +-----------------+-------------------------------------------+---------------------------------------------------------------------------+----------------------------+---------------------+----------------------------+
> |     version     |                 commit_id                 |                     
        commit_message                               |        commit_time         |     build_email
    |         build_time         |
> +-----------------+-------------------------------------------+---------------------------------------------------------------------------+----------------------------+---------------------+----------------------------+
> | 1.7.0-SNAPSHOT  | 1c9e92b0cec18b4ee5a005fd6006ad329e3fa568  | DRILL-4574: Avro Plugin:
Flatten does not work correctly on record items  | 24.06.2016 @ 15:07:25 PDT  | inramana@gmail.com
 | 27.06.2016 @ 10:38:46 PDT  |
> +-----------------+-------------------------------------------+---------------------------------------------------------------------------+----------------------------+---------------------+----------------------------+
> {noformat}
> drillbit.log:
> {noformat}
> 2016-07-06 16:21:14,139 [28826d94-a4bb-325d-6475-d440a1c78da0:foreman] INFO  o.a.drill.exec.work.foreman.Foreman
- Query text for query id 28826d94-a4bb-325d-6475-d440a1c78da0: select * from dfs.`drill/testdata/parquet_storage/int32_10_bs10k_ps1k_gzip.parquet`
> 2016-07-06 16:21:14,395 [28826d94-a4bb-325d-6475-d440a1c78da0:foreman] INFO  o.a.d.exec.store.parquet.Metadata
- Took 0 ms to get file statuses
> 2016-07-06 16:21:14,398 [28826d94-a4bb-325d-6475-d440a1c78da0:foreman] INFO  o.a.d.exec.store.parquet.Metadata
- Fetch parquet metadata: Executed 1 out of 1 using 1 threads. Time: 2ms total, 2.513895ms
avg, 2ms max.
> 2016-07-06 16:21:14,398 [28826d94-a4bb-325d-6475-d440a1c78da0:foreman] INFO  o.a.d.exec.store.parquet.Metadata
- Fetch parquet metadata: Executed 1 out of 1 using 1 threads. Earliest start: 0.907000 μs,
Latest start: 0.907000 μs, Average start: 0.907000 μs .
> 2016-07-06 16:21:14,399 [28826d94-a4bb-325d-6475-d440a1c78da0:foreman] INFO  o.a.d.exec.store.parquet.Metadata
- Took 2 ms to read file metadata
> 2016-07-06 16:21:14,518 [28826d94-a4bb-325d-6475-d440a1c78da0:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 28826d94-a4bb-325d-6475-d440a1c78da0:0:0: State change requested AWAITING_ALLOCATION -->
FAILED
> 2016-07-06 16:21:14,519 [28826d94-a4bb-325d-6475-d440a1c78da0:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 28826d94-a4bb-325d-6475-d440a1c78da0:0:0: State change requested FAILED --> FAILED
> 2016-07-06 16:21:14,519 [28826d94-a4bb-325d-6475-d440a1c78da0:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 28826d94-a4bb-325d-6475-d440a1c78da0:0:0: State change requested FAILED --> FAILED
> 2016-07-06 16:21:14,519 [28826d94-a4bb-325d-6475-d440a1c78da0:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 28826d94-a4bb-325d-6475-d440a1c78da0:0:0: State change requested FAILED --> FINISHED
> 2016-07-06 16:21:14,529 [28826d94-a4bb-325d-6475-d440a1c78da0:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor
- SYSTEM ERROR: IllegalArgumentException
> Fragment 0:0
> [Error Id: d91ec9fe-0ce3-4d05-9e5b-d53cebb99726 on 10.10.30.169:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: IllegalArgumentException
> Fragment 0:0
> [Error Id: d91ec9fe-0ce3-4d05-9e5b-d53cebb99726 on 10.10.30.169:31010]
> 	at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318)
[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185)
[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287)
[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) [drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_79]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_79]
> 	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_79]
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in drill parquet
reader (complex).
> Message: Failure in setting up reader
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message test {
>   required int32 int32_field_required;
>   optional int32 int32_field_optional;
>   repeated int32 int32_field_repeated;
> }
> , metadata: {writer.model.name=example}}, blocks: [BlockMetaData{10, 147 [ColumnMetaData{GZIP
[int32_field_required] INT32  [DELTA_BINARY_PACKED], 4}, ColumnMetaData{GZIP [int32_field_optional]
INT32  [DELTA_BINARY_PACKED], 69}, ColumnMetaData{GZIP [int32_field_repeated] INT32  [DELTA_BINARY_PACKED],
136}]}]}
> 	at org.apache.drill.exec.store.parquet2.DrillParquetReader.handleAndRaise(DrillParquetReader.java:279)
~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at org.apache.drill.exec.store.parquet2.DrillParquetReader.setup(DrillParquetReader.java:271)
~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.ScanBatch.<init>(ScanBatch.java:101) ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:140)
~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at org.apache.drill.exec.store.parquet.ParquetScanBatchCreator.getBatch(ParquetScanBatchCreator.java:53)
~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:148)
~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:171)
~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch(ImplCreator.java:128)
~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.ImplCreator.getChildren(ImplCreator.java:171)
~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.ImplCreator.getRootExec(ImplCreator.java:101)
~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at org.apache.drill.exec.physical.impl.ImplCreator.getExec(ImplCreator.java:79) ~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:231)
[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	... 4 common frames omitted
> Caused by: java.lang.IllegalArgumentException: null
> 	at java.nio.Buffer.limit(Buffer.java:267) ~[na:1.7.0_79]
> 	at org.apache.parquet.bytes.BytesInput$ByteBufferBytesInput.toByteBuffer(BytesInput.java:438)
~[parquet-encoding-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.column.impl.ColumnReaderImpl.readPageV2(ColumnReaderImpl.java:612)
~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.column.impl.ColumnReaderImpl.access$400(ColumnReaderImpl.java:61)
~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:546)
~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.column.impl.ColumnReaderImpl$3.visit(ColumnReaderImpl.java:538)
~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.column.page.DataPageV2.accept(DataPageV2.java:141) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.column.impl.ColumnReaderImpl.readPage(ColumnReaderImpl.java:538)
~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.column.impl.ColumnReaderImpl.checkRead(ColumnReaderImpl.java:530)
~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:642)
~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.column.impl.ColumnReaderImpl.<init>(ColumnReaderImpl.java:358)
~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.column.impl.ColumnReadStoreImpl.newMemColumnReader(ColumnReadStoreImpl.java:82)
~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:77)
~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.io.RecordReaderImplementation.<init>(RecordReaderImplementation.java:270)
~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:140) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:106) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:154)
~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:106) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:82) ~[parquet-column-1.8.1-drill-r0.jar:1.8.1-drill-r0]
> 	at org.apache.drill.exec.store.parquet2.DrillParquetReader.setup(DrillParquetReader.java:268)
~[drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT]
> 	... 14 common frames omitted
> 2016-07-06 16:21:14,585 [CONTROL-rpc-event-queue] WARN  o.a.drill.exec.work.foreman.Foreman
- Dropping request to move to COMPLETED state as query is already at FAILED state (which is
terminal).
> 2016-07-06 16:21:14,590 [CONTROL-rpc-event-queue] WARN  o.a.d.e.w.b.ControlMessageHandler
- Dropping request to cancel fragment. 28826d94-a4bb-325d-6475-d440a1c78da0:0:0 does not exist.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message