drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Deneche A. Hakim (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4317) Exceptions on SELECT and CTAS with large CSV files
Date Wed, 16 Mar 2016 11:25:33 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197208#comment-15197208
] 

Deneche A. Hakim commented on DRILL-4317:
-----------------------------------------

I found a bug in TextInput.updateLengthBasedOnConstraint() when Drill splits csv files. In
most cases it works fine but when the split line ends with an empty value AND one of the previous
rows in the same last batch contain a value in the last column we see the exception described
above.

> Exceptions on SELECT and CTAS with large CSV files
> --------------------------------------------------
>
>                 Key: DRILL-4317
>                 URL: https://issues.apache.org/jira/browse/DRILL-4317
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 1.4.0, 1.5.0
>         Environment: 4 node cluster, Hadoop 2.7.0, 14.04.1-Ubuntu
>            Reporter: Matt Keranen
>            Assignee: Deneche A. Hakim
>
> Selecting from a CSV file or running a CTAS into Parquet generates exceptions.
> Source file is ~650MB, a table of 4 key columns followed by 39 numeric data columns,
otherwise a fairly simple format. Example:
> {noformat}
> 2015-10-17 00:00,f5e9v8u2,err,fr7,226020793,76.094,26307,226020793,76.094,26307,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
> 2015-10-17 00:00,c3f9x5z2,err,mi1,1339159295,216.004,177690,1339159295,216.004,177690,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
> 2015-10-17 00:00,r5z2f2i9,err,mi1,7159994629,39718.011,65793,6142021303,30687.811,64630,143777403,40.521,146,75503742,41.905,89,170771174,168.165,198,192565529,370.475,222,97577280,318.068,120,62631452,288.253,68,32371173,189.527,39,41712265,299.184,46,39046408,363.418,47,34182318,465.343,43,127834582,6485.341,145
> 2015-10-17 00:00,j9s6i8t2,err,fr7,20580443899,277445.055,67826,2814893469,85447.816,54275,2584757097,608.001,2044,1395571268,769.113,1051,3070616988,3000.005,2284,3413811671,6489.060,2569,1772235156,5806.214,1339,1097879284,5064.120,858,691884865,4035.397,511,672967845,4815.875,518,789163614,7306.684,599,813910495,10632.464,627,1462752147,143470.306,1151
> {noformat}
> A "SELECT from `/path/to/file.csv`" runs for 10's of minutes and eventually results in:
> {noformat}
> java.lang.IndexOutOfBoundsException: index: 547681, length: 1 (expected: range(0, 547681))
>         at io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1134)
>         at io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:136)
>         at io.netty.buffer.WrappedByteBuf.getBytes(WrappedByteBuf.java:289)
>         at io.netty.buffer.UnsafeDirectLittleEndian.getBytes(UnsafeDirectLittleEndian.java:26)
>         at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>         at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>         at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>         at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586)
>         at org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:443)
>         at org.apache.drill.exec.vector.accessor.VarCharAccessor.getBytes(VarCharAccessor.java:125)
>         at org.apache.drill.exec.vector.accessor.VarCharAccessor.getString(VarCharAccessor.java:146)
>         at org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:136)
>         at org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:94)
>         at org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148)
>         at org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObject(TypeConvertingSqlAccessor.java:795)
>         at org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:179)
>         at net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351)
>         at org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:420)
>         at sqlline.Rows$Row.<init>(Rows.java:157)
>         at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63)
>         at sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
>         at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
>         at sqlline.SqlLine.print(SqlLine.java:1593)
>         at sqlline.Commands.execute(Commands.java:852)
>         at sqlline.Commands.sql(Commands.java:751)
>         at sqlline.SqlLine.dispatch(SqlLine.java:746)
>         at sqlline.SqlLine.begin(SqlLine.java:621)
>         at sqlline.SqlLine.start(SqlLine.java:375)
>         at sqlline.SqlLine.main(SqlLine.java:268)
> {noformat}
> A CTAS on the same file with storage as Parquet results in:
> {noformat}
> Error: SYSTEM ERROR: IllegalArgumentException: length: -260 (expected: >= 0)
> Fragment 1:2
> [Error Id: 1807615e-4385-4f85-8402-5900aaa568e9 on es07:31010]
>   (java.lang.IllegalArgumentException) length: -260 (expected: >= 0)
>     io.netty.buffer.AbstractByteBuf.checkIndex():1131
>     io.netty.buffer.PooledUnsafeDirectByteBuf.nioBuffer():344
>     io.netty.buffer.WrappedByteBuf.nioBuffer():727
>     io.netty.buffer.UnsafeDirectLittleEndian.nioBuffer():26
>     io.netty.buffer.DrillBuf.nioBuffer():356
>     org.apache.drill.exec.store.ParquetOutputRecordWriter$VarCharParquetConverter.writeField():1842
>     org.apache.drill.exec.store.EventBasedRecordWriter.write():62
>     org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext():106
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>     org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():415
>     org.apache.hadoop.security.UserGroupInformation.doAs():1657
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
>     java.lang.Thread.run():745 (state=,code=0)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message