drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Parth Chandra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3871) Exception on inner join when join predicate is int96 field generated by impala
Date Mon, 05 Oct 2015 23:49:26 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944270#comment-14944270
] 

Parth Chandra commented on DRILL-3871:
--------------------------------------

This is a more generic problem than just int96. The problem occurs for all fixed width types
where there is *one* null at the end of the last page. (There's an off by one error where
the code exits a loop with the last value has been read but not written to the vector. The
next iteration tries to read one more page because the required number of records have not
been read, leading to the exception).

> Exception on inner join when join predicate is int96 field generated by impala
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-3871
>                 URL: https://issues.apache.org/jira/browse/DRILL-3871
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.2.0
>            Reporter: Victoria Markman
>            Assignee: Parth Chandra
>            Priority: Critical
>              Labels: int96
>             Fix For: 1.3.0
>
>         Attachments: tables.tar
>
>
> Both tables in the join where created by impala, with column c_timestamp being parquet
int96. 
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >         max(t1.c_timestamp),
> . . . . . . . . . . . . >         min(t1.c_timestamp),
> . . . . . . . . . . . . >         count(t1.c_timestamp)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . >         imp_t1 t1
> . . . . . . . . . . . . >                 inner join
> . . . . . . . . . . . . >         imp_t2 t2
> . . . . . . . . . . . . > on      (t1.c_timestamp = t2.c_timestamp)
> . . . . . . . . . . . . > ;
> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: TProtocolException:
Required field 'uncompressed_page_size' was not found in serialized data! Struct: PageHeader(type:null,
uncompressed_page_size:0, compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
>         at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
>         at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
>         at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
>         at sqlline.SqlLine.print(SqlLine.java:1583)
>         at sqlline.Commands.execute(Commands.java:852)
>         at sqlline.Commands.sql(Commands.java:751)
>         at sqlline.SqlLine.dispatch(SqlLine.java:738)
>         at sqlline.SqlLine.begin(SqlLine.java:612)
>         at sqlline.SqlLine.start(SqlLine.java:366)
>         at sqlline.SqlLine.main(SqlLine.java:259)
> {code}
> drillbit.log
> {code}
> 2015-09-30 21:15:45,710 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  o.a.d.exec.store.parquet.Metadata
- Took 0 ms to get file statuses
> 2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  o.a.d.exec.store.parquet.Metadata
- Fetch parquet metadata: Executed 1 out of 1 using 1 threads. Time: 1ms total, 1.645381ms
avg, 1ms max.
> 2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  o.a.d.exec.store.parquet.Metadata
- Fetch parquet metadata: Executed 1 out of 1 using 1 threads. Earliest start: 1.332000 μs,
Latest start: 1.332000 μs, Average start: 1.332000 μs .
> 2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested AWAITING_ALLOCATION -->
RUNNING
> 2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  o.a.d.e.w.f.FragmentStatusReporter
- 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State to report: RUNNING
> 2015-09-30 21:15:45,925 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested RUNNING --> FAILED
> 2015-09-30 21:15:45,930 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested FAILED --> FINISHED
> 2015-09-30 21:15:45,931 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor
- SYSTEM ERROR: TProtocolException: Required field 'uncompressed_page_size' was not found
in serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: TProtocolException: Required
field 'uncompressed_page_size' was not found in serialized data! Struct: PageHeader(type:null,
uncompressed_page_size:0, compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
>         at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323)
[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178)
[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292)
[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_71]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_71]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet
record reader.
> Message:
> Hadoop path: /drill/testdata/subqueries/imp_t2/bf4261140dac8d45-814d66b86bf960b8_853027779_data.0.parq
> Total records read: 10
> Mock records read: 0
> Records to read: 1
> Row group index: 0
> Records in row group: 10
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema {
>   optional binary c_varchar (UTF8);
>   optional int32 c_integer;
>   optional int64 c_bigint;
>   optional float c_float;
>   optional double c_double;
>   optional binary c_date (UTF8);
>   optional binary c_time (UTF8);
>   optional int96 c_timestamp;
>   optional boolean c_boolean;
>   optional double d9;
>   optional double d18;
>   optional double d28;
>   optional double d38;
> }
> , metadata: {}}, blocks: [BlockMetaData{10, 1507 [ColumnMetaData{SNAPPY [c_varchar] BINARY
 [PLAIN, PLAIN_DICTIONARY, RLE], 173}, ColumnMetaData{SNAPPY [c_integer] INT32  [PLAIN, PLAIN_DICTIONARY,
RLE], 299}, ColumnMetaData{SNAPPY [c_bigint] INT64  [PLAIN, PLAIN_DICTIONARY, RLE], 453},
ColumnMetaData{SNAPPY [c_float] FLOAT  [PLAIN, PLAIN_DICTIONARY, RLE], 581}, ColumnMetaData{SNAPPY
[c_double] DOUBLE  [PLAIN, PLAIN_DICTIONARY, RLE], 747}, ColumnMetaData{SNAPPY [c_date] BINARY
 [PLAIN, PLAIN_DICTIONARY, RLE], 900}, ColumnMetaData{SNAPPY [c_time] BINARY  [PLAIN, PLAIN_DICTIONARY,
RLE], 1045}, ColumnMetaData{SNAPPY [c_timestamp] INT96  [PLAIN, PLAIN_DICTIONARY, RLE], 1213},
ColumnMetaData{SNAPPY [c_boolean] BOOLEAN  [PLAIN, PLAIN_DICTIONARY, RLE], 1293}, ColumnMetaData{SNAPPY
[d9] DOUBLE  [PLAIN, PLAIN_DICTIONARY, RLE], 1448}, ColumnMetaData{SNAPPY [d18] DOUBLE  [PLAIN,
PLAIN_DICTIONARY, RLE], 1609}, ColumnMetaData{SNAPPY [d28] DOUBLE  [PLAIN, PLAIN_DICTIONARY,
RLE], 1771}, ColumnMetaData{SNAPPY [d38] DOUBLE  [PLAIN, PLAIN_DICTIONARY, RLE], 1933}]}]}
>         at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise(ParquetRecordReader.java:346)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:448)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:183) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:403)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:218)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext(StreamingAggBatch.java:136)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_71]
>         at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_71]
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:252)
[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         ... 4 common frames omitted
> Caused by: java.io.IOException: can not read class parquet.format.PageHeader: Required
field 'uncompressed_page_size' was not found in serialized data! Struct: PageHeader(type:null,
uncompressed_page_size:0, compressed_page_size:0)
>         at parquet.format.Util.read(Util.java:50) ~[parquet-format-2.1.1-drill-r1.jar:na]
>         at parquet.format.Util.readPageHeader(Util.java:26) ~[parquet-format-2.1.1-drill-r1.jar:na]
>         at org.apache.drill.exec.store.parquet.ColumnDataReader.readPageHeader(ColumnDataReader.java:46)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.columnreaders.PageReader.next(PageReader.java:191)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.columnreaders.NullableColumnReader.processPages(NullableColumnReader.java:76)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:387)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:430)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         ... 43 common frames omitted
> Caused by: parquet.org.apache.thrift.protocol.TProtocolException: Required field 'uncompressed_page_size'
was not found in serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0,
compressed_page_size:0)
>         at parquet.format.PageHeader.read(PageHeader.java:905) ~[parquet-format-2.1.1-drill-r1.jar:na]
>         at parquet.format.Util.read(Util.java:47) ~[parquet-format-2.1.1-drill-r1.jar:na]
>         ... 49 common frames omitted
> 2015-09-30 21:15:45,951 [BitServer-4] WARN  o.a.drill.exec.work.foreman.Foreman - Dropping
request to move to COMPLETED state as query is already at FAILED state (which is terminal).
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message