drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Artavazd Balaian (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-7249) java.lang.ArrayIndexOutOfBoundsException when query Parquet file
Date Sun, 12 May 2019 08:28:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-7249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Artavazd Balaian updated DRILL-7249:
------------------------------------
    Description: 
Environment:
{code:java}
C:\repos\drill\distribution\src\resources>java -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)

C:\repos\drill\distribution\src\resources>ver

Microsoft Windows [Version 10.0.17134.706]

C:\repos\drill\distribution\src\resources>
{code}
Link to the parquet file: [https://drive.google.com/open?id=1uDbAS_yFQFLrRX-9wwDVmNGuYVDDO2UA]

Master branch, commit 0195d1f34be7fd385ba76d2fd3e14a9fa13bd375, run in IntelliJ IDEA (org.apache.drill.exec.server.Drillbit).

I use DBeaver 6.0.3 to connect to local Apache Drill. When I run next query:
{code:java}
SELECT * FROM dfs.`/temp/BeamRegression/link_stat_7.parquet`
ORDER BY enter_time
{code}
 It fails with:
{code:java}
org.jkiss.dbeaver.model.exec.DBCException: SQL Error: DATA_READ ERROR: Error reading from
Parquet file
File: /temp/BeamRegression/link_stat_7.parquet
Column: L7_StdVeh_OutLinks
Row Group Start: 831052361
Fragment 3:4

[Error Id: dee97d7d-68da-4670-bf57-ee83a60abbf3 on DESKTOP-FONP7QD:31010]
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.nextRow(JDBCResultSetImpl.java:180)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.fetchQueryData(SQLQueryJob.java:744)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeStatement(SQLQueryJob.java:484)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.lambda$0(SQLQueryJob.java:407)
at org.jkiss.dbeaver.model.DBUtils.tryExecuteRecover(DBUtils.java:1684)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeSingleQuery(SQLQueryJob.java:405)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.extractData(SQLQueryJob.java:849)
at org.jkiss.dbeaver.ui.editors.sql.SQLEditor$QueryResultsContainer.readData(SQLEditor.java:2720)
at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.lambda$0(ResultSetJobDataRead.java:102)
at org.jkiss.dbeaver.model.DBUtils.tryExecuteRecover(DBUtils.java:1684)
at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.run(ResultSetJobDataRead.java:100)
at org.jkiss.dbeaver.model.runtime.AbstractJob.run(AbstractJob.java:102)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
Caused by: java.sql.SQLException: DATA_READ ERROR: Error reading from Parquet file
File: /temp/BeamRegression/link_stat_7.parquet
Column: L7_StdVeh_OutLinks
Row Group Start: 831052361
Fragment 3:4

[Error Id: dee97d7d-68da-4670-bf57-ee83a60abbf3 on DESKTOP-FONP7QD:31010]
at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:536)
at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:640)
at oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:217)
at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:151)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.next(JDBCResultSetImpl.java:269)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.nextRow(JDBCResultSetImpl.java:177)
... 12 more
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: Error
reading from Parquet file
File: /temp/BeamRegression/link_stat_7.parquet
Column: L7_StdVeh_OutLinks
Row Group Start: 831052361
Fragment 3:4

[Error Id: dee97d7d-68da-4670-bf57-ee83a60abbf3 on DESKTOP-FONP7QD:31010]
at oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273)
at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243)
at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at oadd.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
at oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at oadd.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at oadd.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
at oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at oadd.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at oadd.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.ArrayIndexOutOfBoundsException: 106634
at org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainDoubleDictionary.decodeToDouble(PlainValuesDictionary.java:208)
at org.apache.parquet.column.values.dictionary.DictionaryValuesReader.readDouble(DictionaryValuesReader.java:101)
at org.apache.drill.exec.store.parquet.columnreaders.ParquetFixedWidthDictionaryReaders$DictionaryFloat8Reader.readField(ParquetFixedWidthDictionaryReaders.java:374)
at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.readValues(ColumnReader.java:160)
at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPageData(ColumnReader.java:218)
at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.determineSize(ColumnReader.java:194)
at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPages(ColumnReader.java:141)
at org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readAllFixedFieldsSerial(BatchReader.java:63)
at org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readAllFixedFields(BatchReader.java:56)
at org.apache.drill.exec.store.parquet.columnreaders.BatchReader$VariableWidthReader.readRecords(BatchReader.java:157)
at org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readBatch(BatchReader.java:43)
at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:253)
at org.apache.drill.exec.physical.impl.ScanBatch.internalNext(ScanBatch.java:223)
at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:271)
at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104)
at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:152)
at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94)
at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:296)
at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:283)
at .......(:0)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:283)
at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at .......(:0)

{code}
The same file I can read in Spark (2.4.2):
{code:java}
spark.read.parquet(s"C:/temp/BeamRegression/link_stat_7.parquet")
    .orderBy(col("enter_time"))
    .show(10000)
{code}
What I can see in the debugger (I moved code a bit  to be able to see local variables):

!image-2019-05-12-15-25-45-218.png!  

 

  was:
Environment:
{code:java}
C:\repos\drill\distribution\src\resources>java -version
java version "1.8.0_211"
Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)

C:\repos\drill\distribution\src\resources>ver

Microsoft Windows [Version 10.0.17134.706]

C:\repos\drill\distribution\src\resources>
{code}
Link to the parquet file: [https://drive.google.com/open?id=1uDbAS_yFQFLrRX-9wwDVmNGuYVDDO2UA]

Master branch, commit 0195d1f34be7fd385ba76d2fd3e14a9fa13bd375, run in IntelliJ IDEA (org.apache.drill.exec.server.Drillbit).

I use DBeaver 6.0.3 to connect to local Apache Drill. When I run next query:
{code:java}
SELECT * FROM dfs.`/temp/BeamRegression/link_stat_7.parquet`
ORDER BY enter_time
{code}
 It fails with:
{code:java}
org.jkiss.dbeaver.model.exec.DBCException: SQL Error: DATA_READ ERROR: Error reading from
Parquet file
File: /temp/BeamRegression/link_stat_7.parquet
Column: L7_StdVeh_OutLinks
Row Group Start: 831052361
Fragment 3:4

[Error Id: dee97d7d-68da-4670-bf57-ee83a60abbf3 on DESKTOP-FONP7QD:31010]
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.nextRow(JDBCResultSetImpl.java:180)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.fetchQueryData(SQLQueryJob.java:744)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeStatement(SQLQueryJob.java:484)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.lambda$0(SQLQueryJob.java:407)
at org.jkiss.dbeaver.model.DBUtils.tryExecuteRecover(DBUtils.java:1684)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeSingleQuery(SQLQueryJob.java:405)
at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.extractData(SQLQueryJob.java:849)
at org.jkiss.dbeaver.ui.editors.sql.SQLEditor$QueryResultsContainer.readData(SQLEditor.java:2720)
at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.lambda$0(ResultSetJobDataRead.java:102)
at org.jkiss.dbeaver.model.DBUtils.tryExecuteRecover(DBUtils.java:1684)
at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.run(ResultSetJobDataRead.java:100)
at org.jkiss.dbeaver.model.runtime.AbstractJob.run(AbstractJob.java:102)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
Caused by: java.sql.SQLException: DATA_READ ERROR: Error reading from Parquet file
File: /temp/BeamRegression/link_stat_7.parquet
Column: L7_StdVeh_OutLinks
Row Group Start: 831052361
Fragment 3:4

[Error Id: dee97d7d-68da-4670-bf57-ee83a60abbf3 on DESKTOP-FONP7QD:31010]
at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:536)
at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:640)
at oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:217)
at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:151)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.next(JDBCResultSetImpl.java:269)
at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.nextRow(JDBCResultSetImpl.java:177)
... 12 more
Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: Error
reading from Parquet file
File: /temp/BeamRegression/link_stat_7.parquet
Column: L7_StdVeh_OutLinks
Row Group Start: 831052361
Fragment 3:4

[Error Id: dee97d7d-68da-4670-bf57-ee83a60abbf3 on DESKTOP-FONP7QD:31010]
at oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273)
at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243)
at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at oadd.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
at oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
at oadd.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
at oadd.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
at oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at oadd.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at oadd.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.ArrayIndexOutOfBoundsException: 106634
at org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainDoubleDictionary.decodeToDouble(PlainValuesDictionary.java:208)
at org.apache.parquet.column.values.dictionary.DictionaryValuesReader.readDouble(DictionaryValuesReader.java:101)
at org.apache.drill.exec.store.parquet.columnreaders.ParquetFixedWidthDictionaryReaders$DictionaryFloat8Reader.readField(ParquetFixedWidthDictionaryReaders.java:374)
at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.readValues(ColumnReader.java:160)
at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPageData(ColumnReader.java:218)
at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.determineSize(ColumnReader.java:194)
at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPages(ColumnReader.java:141)
at org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readAllFixedFieldsSerial(BatchReader.java:63)
at org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readAllFixedFields(BatchReader.java:56)
at org.apache.drill.exec.store.parquet.columnreaders.BatchReader$VariableWidthReader.readRecords(BatchReader.java:157)
at org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readBatch(BatchReader.java:43)
at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:253)
at org.apache.drill.exec.physical.impl.ScanBatch.internalNext(ScanBatch.java:223)
at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:271)
at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104)
at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:152)
at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94)
at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:296)
at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:283)
at .......(:0)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:283)
at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
at .......(:0)

{code}
The same file I can read in Spark (2.4.2):
{code:java}
spark.read.parquet(s"C:/temp/BeamRegression/link_stat_7.parquet")
    .orderBy(col("enter_time"))
    .show(10000)
{code}
 


> java.lang.ArrayIndexOutOfBoundsException when query Parquet file
> ----------------------------------------------------------------
>
>                 Key: DRILL-7249
>                 URL: https://issues.apache.org/jira/browse/DRILL-7249
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Artavazd Balaian
>            Priority: Critical
>
> Environment:
> {code:java}
> C:\repos\drill\distribution\src\resources>java -version
> java version "1.8.0_211"
> Java(TM) SE Runtime Environment (build 1.8.0_211-b12)
> Java HotSpot(TM) 64-Bit Server VM (build 25.211-b12, mixed mode)
> C:\repos\drill\distribution\src\resources>ver
> Microsoft Windows [Version 10.0.17134.706]
> C:\repos\drill\distribution\src\resources>
> {code}
> Link to the parquet file: [https://drive.google.com/open?id=1uDbAS_yFQFLrRX-9wwDVmNGuYVDDO2UA]
> Master branch, commit 0195d1f34be7fd385ba76d2fd3e14a9fa13bd375, run in IntelliJ IDEA
(org.apache.drill.exec.server.Drillbit).
> I use DBeaver 6.0.3 to connect to local Apache Drill. When I run next query:
> {code:java}
> SELECT * FROM dfs.`/temp/BeamRegression/link_stat_7.parquet`
> ORDER BY enter_time
> {code}
>  It fails with:
> {code:java}
> org.jkiss.dbeaver.model.exec.DBCException: SQL Error: DATA_READ ERROR: Error reading
from Parquet file
> File: /temp/BeamRegression/link_stat_7.parquet
> Column: L7_StdVeh_OutLinks
> Row Group Start: 831052361
> Fragment 3:4
> [Error Id: dee97d7d-68da-4670-bf57-ee83a60abbf3 on DESKTOP-FONP7QD:31010]
> at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.nextRow(JDBCResultSetImpl.java:180)
> at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.fetchQueryData(SQLQueryJob.java:744)
> at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeStatement(SQLQueryJob.java:484)
> at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.lambda$0(SQLQueryJob.java:407)
> at org.jkiss.dbeaver.model.DBUtils.tryExecuteRecover(DBUtils.java:1684)
> at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeSingleQuery(SQLQueryJob.java:405)
> at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.extractData(SQLQueryJob.java:849)
> at org.jkiss.dbeaver.ui.editors.sql.SQLEditor$QueryResultsContainer.readData(SQLEditor.java:2720)
> at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.lambda$0(ResultSetJobDataRead.java:102)
> at org.jkiss.dbeaver.model.DBUtils.tryExecuteRecover(DBUtils.java:1684)
> at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.run(ResultSetJobDataRead.java:100)
> at org.jkiss.dbeaver.model.runtime.AbstractJob.run(AbstractJob.java:102)
> at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
> Caused by: java.sql.SQLException: DATA_READ ERROR: Error reading from Parquet file
> File: /temp/BeamRegression/link_stat_7.parquet
> Column: L7_StdVeh_OutLinks
> Row Group Start: 831052361
> Fragment 3:4
> [Error Id: dee97d7d-68da-4670-bf57-ee83a60abbf3 on DESKTOP-FONP7QD:31010]
> at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:536)
> at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:640)
> at oadd.org.apache.calcite.avatica.AvaticaResultSet.next(AvaticaResultSet.java:217)
> at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:151)
> at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.next(JDBCResultSetImpl.java:269)
> at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCResultSetImpl.nextRow(JDBCResultSetImpl.java:177)
> ... 12 more
> Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR:
Error reading from Parquet file
> File: /temp/BeamRegression/link_stat_7.parquet
> Column: L7_StdVeh_OutLinks
> Row Group Start: 831052361
> Fragment 3:4
> [Error Id: dee97d7d-68da-4670-bf57-ee83a60abbf3 on DESKTOP-FONP7QD:31010]
> at oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123)
> at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:422)
> at oadd.org.apache.drill.exec.rpc.user.UserClient.handle(UserClient.java:96)
> at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:273)
> at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:243)
> at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:88)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
> at oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:287)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
> at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
> at oadd.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:312)
> at oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:286)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
> at oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:335)
> at oadd.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1294)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:356)
> at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:342)
> at oadd.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:911)
> at oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
> at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
> at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
> at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
> at oadd.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
> at oadd.io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 106634
> at org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainDoubleDictionary.decodeToDouble(PlainValuesDictionary.java:208)
> at org.apache.parquet.column.values.dictionary.DictionaryValuesReader.readDouble(DictionaryValuesReader.java:101)
> at org.apache.drill.exec.store.parquet.columnreaders.ParquetFixedWidthDictionaryReaders$DictionaryFloat8Reader.readField(ParquetFixedWidthDictionaryReaders.java:374)
> at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.readValues(ColumnReader.java:160)
> at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPageData(ColumnReader.java:218)
> at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.determineSize(ColumnReader.java:194)
> at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.processPages(ColumnReader.java:141)
> at org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readAllFixedFieldsSerial(BatchReader.java:63)
> at org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readAllFixedFields(BatchReader.java:56)
> at org.apache.drill.exec.store.parquet.columnreaders.BatchReader$VariableWidthReader.readRecords(BatchReader.java:157)
> at org.apache.drill.exec.store.parquet.columnreaders.BatchReader.readBatch(BatchReader.java:43)
> at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.next(ParquetRecordReader.java:253)
> at org.apache.drill.exec.physical.impl.ScanBatch.internalNext(ScanBatch.java:223)
> at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:271)
> at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:126)
> at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:116)
> at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:63)
> at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:141)
> at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:186)
> at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:104)
> at org.apache.drill.exec.physical.impl.partitionsender.PartitionSenderRootExec.innerNext(PartitionSenderRootExec.java:152)
> at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:94)
> at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:296)
> at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:283)
> at .......(:0)
> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
> at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:283)
> at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
> at .......(:0)
> {code}
> The same file I can read in Spark (2.4.2):
> {code:java}
> spark.read.parquet(s"C:/temp/BeamRegression/link_stat_7.parquet")
>     .orderBy(col("enter_time"))
>     .show(10000)
> {code}
> What I can see in the debugger (I moved code a bit  to be able to see local variables):
> !image-2019-05-12-15-25-45-218.png!  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message