drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deneche A. Hakim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4317) Exceptions on SELECT and CTAS with large CSV files
Date Tue, 15 Mar 2016 07:56:33 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15194880#comment-15194880
] 

Deneche A. Hakim commented on DRILL-4317:
-----------------------------------------

You can reproduce it in HDFS

> Exceptions on SELECT and CTAS with large CSV files
> --------------------------------------------------
>
>                 Key: DRILL-4317
>                 URL: https://issues.apache.org/jira/browse/DRILL-4317
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 1.4.0, 1.5.0
>         Environment: 4 node cluster, Hadoop 2.7.0, 14.04.1-Ubuntu
>            Reporter: Matt Keranen
>            Assignee: Deneche A. Hakim
>
> Selecting from a CSV file or running a CTAS into Parquet generates exceptions.
> Source file is ~650MB, a table of 4 key columns followed by 39 numeric data columns,
otherwise a fairly simple format. Example:
> {noformat}
> 2015-10-17 00:00,f5e9v8u2,err,fr7,226020793,76.094,26307,226020793,76.094,26307,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
> 2015-10-17 00:00,c3f9x5z2,err,mi1,1339159295,216.004,177690,1339159295,216.004,177690,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
> 2015-10-17 00:00,r5z2f2i9,err,mi1,7159994629,39718.011,65793,6142021303,30687.811,64630,143777403,40.521,146,75503742,41.905,89,170771174,168.165,198,192565529,370.475,222,97577280,318.068,120,62631452,288.253,68,32371173,189.527,39,41712265,299.184,46,39046408,363.418,47,34182318,465.343,43,127834582,6485.341,145
> 2015-10-17 00:00,j9s6i8t2,err,fr7,20580443899,277445.055,67826,2814893469,85447.816,54275,2584757097,608.001,2044,1395571268,769.113,1051,3070616988,3000.005,2284,3413811671,6489.060,2569,1772235156,5806.214,1339,1097879284,5064.120,858,691884865,4035.397,511,672967845,4815.875,518,789163614,7306.684,599,813910495,10632.464,627,1462752147,143470.306,1151
> {noformat}
> A "SELECT from `/path/to/file.csv`" runs for 10's of minutes and eventually results in:
> {noformat}
> java.lang.IndexOutOfBoundsException: index: 547681, length: 1 (expected: range(0, 547681))
>         at io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1134)
>         at io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:136)
>         at io.netty.buffer.WrappedByteBuf.getBytes(WrappedByteBuf.java:289)
>         at io.netty.buffer.UnsafeDirectLittleEndian.getBytes(UnsafeDirectLittleEndian.java:26)
>         at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>         at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>         at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>         at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>         at org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:443)
>         at org.apache.drill.exec.vector.accessor.VarCharAccessor.getBytes(VarCharAccessor.java:125)
>         at org.apache.drill.exec.vector.accessor.VarCharAccessor.getString(VarCharAccessor.java:146)
>         at org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:136)
>         at org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:94)
>         at org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148)
>         at org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObject(TypeConvertingSqlAccessor.java:795)
>         at org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:179)
>         at net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
>         at org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:420)
>         at sqlline.Rows$Row.<init>(Rows.java:157)
>         at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63)
>         at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
>         at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
>         at sqlline.SqlLine.print(SqlLine.java:1593)
>         at sqlline.Commands.execute(Commands.java:852)
>         at sqlline.Commands.sql(Commands.java:751)
>         at sqlline.SqlLine.dispatch(SqlLine.java:746)
>         at sqlline.SqlLine.begin(SqlLine.java:621)
>         at sqlline.SqlLine.start(SqlLine.java:375)
>         at sqlline.SqlLine.main(SqlLine.java:268)
> {noformat}
> A CTAS on the same file with storage as Parquet results in:
> {noformat}
> Error: SYSTEM ERROR: IllegalArgumentException: length: -260 (expected: >= 0)
> Fragment 1:2
> [Error Id: 1807615e-4385-4f85-8402-5900aaa568e9 on es07:31010]
>   (java.lang.IllegalArgumentException) length: -260 (expected: >= 0)
>     io.netty.buffer.AbstractByteBuf.checkIndex():1131
>     io.netty.buffer.PooledUnsafeDirectByteBuf.nioBuffer():344
>     io.netty.buffer.WrappedByteBuf.nioBuffer():727
>     io.netty.buffer.UnsafeDirectLittleEndian.nioBuffer():26
>     io.netty.buffer.DrillBuf.nioBuffer():356
>     org.apache.drill.exec.store.ParquetOutputRecordWriter$VarCharParquetConverter.writeField():1842
>     org.apache.drill.exec.store.EventBasedRecordWriter.write():62
>     org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():106
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>     org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():415
>     org.apache.hadoop.security.UserGroupInformation.doAs():1657
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():745 (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message