drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3871) Exception on inner join when join predicate is int96 field generated by impala
Date Mon, 26 Oct 2015 17:39:27 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974617#comment-14974617
] 

ASF GitHub Bot commented on DRILL-3871:
---------------------------------------

GitHub user parthchandra opened a pull request:

    https://github.com/apache/drill/pull/219

    DRILL-3871: Off by one error while reading binary fields with one ter…

    …minal null in parquet.
    
    Changes -
      1) Rewrote the NullableColumnReader.processPages function to process runs of Null values
and Non-Null values without needing to keeping track of whether the previous iteration in
the while loop had encountered a null or not. A pair of loops now iterates over a run of nulls
or a run of non-null values.
      2) Removed some redundant code.
      3) Renamed some variables. The indexInOutputVector is now replaced by two local variables,
readCount and writeCount only for clarity.
     4) Adding tracing.
     5) Added unit tests for edge cases of nulls occurring on page boundaries. 
    
    For all the unit tests, tpch-h and tpc-ds test data sets, the state of the NullableColumnReader
at the end of each iteration of processPages is identical to the old code. In addition the
boundary conditions are taken care of.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/parthchandra/incubator-drill DRILL-3871

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/219.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #219
    
----
commit d23ceb2a4c32da9535f1e482c4c70fcc31b8b2b8
Author: Parth Chandra <parthc@apache.org>
Date:   2015-10-05T17:25:56Z

    DRILL-3871: Off by one error while reading binary fields with one terminal null in parquet.

----


> Exception on inner join when join predicate is int96 field generated by impala
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-3871
>                 URL: https://issues.apache.org/jira/browse/DRILL-3871
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.2.0
>            Reporter: Victoria Markman
>            Assignee: Parth Chandra
>            Priority: Critical
>              Labels: int96
>             Fix For: 1.3.0
>
>         Attachments: tables.tar
>
>
> Both tables in the join where created by impala, with column c_timestamp being parquet
int96. 
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >         max(t1.c_timestamp),
> . . . . . . . . . . . . >         min(t1.c_timestamp),
> . . . . . . . . . . . . >         count(t1.c_timestamp)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . >         imp_t1 t1
> . . . . . . . . . . . . >                 inner join
> . . . . . . . . . . . . >         imp_t2 t2
> . . . . . . . . . . . . > on      (t1.c_timestamp = t2.c_timestamp)
> . . . . . . . . . . . . > ;
> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: TProtocolException:
Required field 'uncompressed_page_size' was not found in serialized data! Struct: PageHeader(type:null,
uncompressed_page_size:0, compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
>         at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
>         at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
>         at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
>         at sqlline.SqlLine.print(SqlLine.java:1583)
>         at sqlline.Commands.execute(Commands.java:852)
>         at sqlline.Commands.sql(Commands.java:751)
>         at sqlline.SqlLine.dispatch(SqlLine.java:738)
>         at sqlline.SqlLine.begin(SqlLine.java:612)
>         at sqlline.SqlLine.start(SqlLine.java:366)
>         at sqlline.SqlLine.main(SqlLine.java:259)
> {code}
> drillbit.log
> {code}
> 2015-09-30 21:15:45,710 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  o.a.d.exec.store.parquet.Metadata
- Took 0 ms to get file statuses
> 2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  o.a.d.exec.store.parquet.Metadata
- Fetch parquet metadata: Executed 1 out of 1 using 1 threads. Time: 1ms total, 1.645381ms
avg, 1ms max.
> 2015-09-30 21:15:45,712 [29f3aefe-3209-a6e6-0418-500dac60a339:foreman] INFO  o.a.d.exec.store.parquet.Metadata
- Fetch parquet metadata: Executed 1 out of 1 using 1 threads. Earliest start: 1.332000 μs,
Latest start: 1.332000 μs, Average start: 1.332000 μs .
> 2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested AWAITING_ALLOCATION -->
RUNNING
> 2015-09-30 21:15:45,830 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  o.a.d.e.w.f.FragmentStatusReporter
- 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State to report: RUNNING
> 2015-09-30 21:15:45,925 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested RUNNING --> FAILED
> 2015-09-30 21:15:45,930 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] INFO  o.a.d.e.w.fragment.FragmentExecutor
- 29f3aefe-3209-a6e6-0418-500dac60a339:0:0: State change requested FAILED --> FINISHED
> 2015-09-30 21:15:45,931 [29f3aefe-3209-a6e6-0418-500dac60a339:frag:0:0] ERROR o.a.d.e.w.fragment.FragmentExecutor
- SYSTEM ERROR: TProtocolException: Required field 'uncompressed_page_size' was not found
in serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0, compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: TProtocolException: Required
field 'uncompressed_page_size' was not found in serialized data! Struct: PageHeader(type:null,
uncompressed_page_size:0, compressed_page_size:0)
> Fragment 0:0
> [Error Id: eb6a5df8-fc59-409b-957a-59cb1079b5b8 on atsqa4-133.qa.lab:31010]
>         at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323)
[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178)
[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:292)
[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
[na:1.7.0_71]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
[na:1.7.0_71]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: org.apache.drill.common.exceptions.DrillRuntimeException: Error in parquet
record reader.
> Message:
> Hadoop path: /drill/testdata/subqueries/imp_t2/bf4261140dac8d45-814d66b86bf960b8_853027779_data.0.parq
> Total records read: 10
> Mock records read: 0
> Records to read: 1
> Row group index: 0
> Records in row group: 10
> Parquet Metadata: ParquetMetaData{FileMetaData{schema: message schema {
>   optional binary c_varchar (UTF8);
>   optional int32 c_integer;
>   optional int64 c_bigint;
>   optional float c_float;
>   optional double c_double;
>   optional binary c_date (UTF8);
>   optional binary c_time (UTF8);
>   optional int96 c_timestamp;
>   optional boolean c_boolean;
>   optional double d9;
>   optional double d18;
>   optional double d28;
>   optional double d38;
> }
> , metadata: {}}, blocks: [BlockMetaData{10, 1507 [ColumnMetaData{SNAPPY [c_varchar] BINARY
 [PLAIN, PLAIN_DICTIONARY, RLE], 173}, ColumnMetaData{SNAPPY [c_integer] INT32  [PLAIN, PLAIN_DICTIONARY,
RLE], 299}, ColumnMetaData{SNAPPY [c_bigint] INT64  [PLAIN, PLAIN_DICTIONARY, RLE], 453},
ColumnMetaData{SNAPPY [c_float] FLOAT  [PLAIN, PLAIN_DICTIONARY, RLE], 581}, ColumnMetaData{SNAPPY
[c_double] DOUBLE  [PLAIN, PLAIN_DICTIONARY, RLE], 747}, ColumnMetaData{SNAPPY [c_date] BINARY
 [PLAIN, PLAIN_DICTIONARY, RLE], 900}, ColumnMetaData{SNAPPY [c_time] BINARY  [PLAIN, PLAIN_DICTIONARY,
RLE], 1045}, ColumnMetaData{SNAPPY [c_timestamp] INT96  [PLAIN, PLAIN_DICTIONARY, RLE], 1213},
ColumnMetaData{SNAPPY [c_boolean] BOOLEAN  [PLAIN, PLAIN_DICTIONARY, RLE], 1293}, ColumnMetaData{SNAPPY
[d9] DOUBLE  [PLAIN, PLAIN_DICTIONARY, RLE], 1448}, ColumnMetaData{SNAPPY [d18] DOUBLE  [PLAIN,
PLAIN_DICTIONARY, RLE], 1609}, ColumnMetaData{SNAPPY [d28] DOUBLE  [PLAIN, PLAIN_DICTIONARY,
RLE], 1771}, ColumnMetaData{SNAPPY [d38] DOUBLE  [PLAIN, PLAIN_DICTIONARY, RLE], 1933}]}]}
>         at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.handleAndRaise(ParquetRecordReader.java:346)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:448)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:183) ~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.join.HashJoinBatch.executeBuildPhase(HashJoinBatch.java:403)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.join.HashJoinBatch.innerNext(HashJoinBatch.java:218)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext(StreamingAggBatch.java:136)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:104)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:94)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:147)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:118)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:83)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext(ScreenCreator.java:80)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:73)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:258)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:252)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at java.security.AccessController.doPrivileged(Native Method) ~[na:1.7.0_71]
>         at javax.security.auth.Subject.doAs(Subject.java:415) ~[na:1.7.0_71]
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566)
~[hadoop-common-2.5.1-mapr-1503.jar:na]
>         at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:252)
[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         ... 4 common frames omitted
> Caused by: java.io.IOException: can not read class parquet.format.PageHeader: Required
field 'uncompressed_page_size' was not found in serialized data! Struct: PageHeader(type:null,
uncompressed_page_size:0, compressed_page_size:0)
>         at parquet.format.Util.read(Util.java:50) ~[parquet-format-2.1.1-drill-r1.jar:na]
>         at parquet.format.Util.readPageHeader(Util.java:26) ~[parquet-format-2.1.1-drill-r1.jar:na]
>         at org.apache.drill.exec.store.parquet.ColumnDataReader.readPageHeader(ColumnDataReader.java:46)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.columnreaders.PageReader.next(PageReader.java:191)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.columnreaders.NullableColumnReader.processPages(NullableColumnReader.java:76)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.readAllFixedFields(ParquetRecordReader.java:387)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:430)
~[drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
>         ... 43 common frames omitted
> Caused by: parquet.org.apache.thrift.protocol.TProtocolException: Required field 'uncompressed_page_size'
was not found in serialized data! Struct: PageHeader(type:null, uncompressed_page_size:0,
compressed_page_size:0)
>         at parquet.format.PageHeader.read(PageHeader.java:905) ~[parquet-format-2.1.1-drill-r1.jar:na]
>         at parquet.format.Util.read(Util.java:47) ~[parquet-format-2.1.1-drill-r1.jar:na]
>         ... 49 common frames omitted
> 2015-09-30 21:15:45,951 [BitServer-4] WARN  o.a.drill.exec.work.foreman.Foreman - Dropping
request to move to COMPLETED state as query is already at FAILED state (which is terminal).
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message