drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Wilson (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-5451) Query on csv file w/ header fails with an exception when non existing column is requested if file is over 4096 lines long
Date Fri, 28 Apr 2017 10:13:04 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Paul Wilson updated DRILL-5451:
-------------------------------
    Attachment: 4097_lines.csvh

> Query on csv file w/ header fails with an exception when non existing column is requested
if file is over 4096 lines long
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-5451
>                 URL: https://issues.apache.org/jira/browse/DRILL-5451
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 1.10.0
>         Environment: Tested on CentOs 7 and Ubuntu
>            Reporter: Paul Wilson
>         Attachments: 4097_lines.csvh
>
>
> When querying a text (csv) file with extractHeaders set to true, selecting a non existent
column works as expected (returns "empty" value) when file has 4096 lines or fewer (1 header
plus 4095 data), but results in an IndexOutOfBoundsException where the file has 4097 lines
or more.
> With Storage config:
> {code:javascript}
> "csvh": {
>       "type": "text",
>       "extensions": [
>         "csvh"
>       ],
>       "extractHeader": true,
>       "delimiter": ","
>     }
> {code}
> In the following 4096_lines.csvh has is identical to 4097_lines.csvh with the last line
removed.
> Results:
> {noformat}
> 0: jdbc:drill:zk=local> select * from dfs.`/test/4097_lines.csvh` LIMIT 2;
> +----------+------------------------+
> | line_no  |    line_description    |
> +----------+------------------------+
> | 2        | this is line number 2  |
> | 3        | this is line number 3  |
> +----------+------------------------+
> 2 rows selected (2.455 seconds)
> 0: jdbc:drill:zk=local> select line_no, non_existent_field from dfs.`/test/4096_lines.csvh`
LIMIT 2;
> +----------+---------------------+
> | line_no  | non_existent_field  |
> +----------+---------------------+
> | 2        |                     |
> | 3        |                     |
> +----------+---------------------+
> 2 rows selected (2.248 seconds)
> 0: jdbc:drill:zk=local> select line_no, non_existent_field from dfs.`/test/4097_lines.csvh`
LIMIT 2;
> Error: SYSTEM ERROR: IndexOutOfBoundsException: index: 16384, length: 4 (expected: range(0,
16384))
> Fragment 0:0
> [Error Id: eb0974a8-026d-4048-9f10-ffb821a0d300 on localhost:31010]
>   (java.lang.IndexOutOfBoundsException) index: 16384, length: 4 (expected: range(0, 16384))
>     io.netty.buffer.DrillBuf.checkIndexD():123
>     io.netty.buffer.DrillBuf.chk():147
>     io.netty.buffer.DrillBuf.getInt():520
>     org.apache.drill.exec.vector.UInt4Vector$Accessor.get():358
>     org.apache.drill.exec.vector.VarCharVector$Mutator.setValueCount():659
>     org.apache.drill.exec.physical.impl.ScanBatch.next():234
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     org.apache.drill.exec.physical.impl.limit.LimitRecordBatch.innerNext():115
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.record.AbstractRecordBatch.next():119
>     org.apache.drill.exec.record.AbstractRecordBatch.next():109
>     org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
>     org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
>     org.apache.drill.exec.record.AbstractRecordBatch.next():162
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
>     org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
>     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
>     org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1657
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>     java.lang.Thread.run():745 (state=,code=0)
> 0: jdbc:drill:zk=local> 
> {noformat}
> This seems similar to the issue fixed in [DRILL-4108|https://issues.apache.org/jira/browse/DRILL-4108]
but it now only manifests for longer files.
> I also see a similar result (i.e. it works for <= 4096 lines, IOBE for >4096 lines)
for a {noformat} SELECT count(*) ...{noformat} from these files.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message