drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5557) java.lang.IndexOutOfBoundsException: writerIndex:
Date Wed, 31 May 2017 19:00:07 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16031724#comment-16031724
] 

Paul Rogers commented on DRILL-5557:
------------------------------------

I've seen something similar, maybe my experience can help pin down this problem.

DRILL-5470 describes a user experience with bizarre string lengths when reading CSV data,
probably due to vector corruption.

DRILL-5487 describes a case where a truncated last row in a CSV file leads to vector corruption.
In that case, just one row was missing and we got some strange behavior. If more rows are
missing, it might mean we get the error seen here.

Drill has the ability to "back-fill" values when reading files, such as JSON, that may omit
columns in some records. The back-filling works only for some types. Back-filling is *not*
done at the end of a batch. This may be the cause of the issue here.

Normally, CSV files have the same columns in every row. I wonder, in your data file, do you
have "missing" columns in the end of the file:

{code}
a, b, c
10, 20, 30
11
12
{code}

Or, do you have one file, with, say, three columns and some other files with only two? (That
is, does the number of columns change from file to file?)

Would be good to finally nail down this issue...

> java.lang.IndexOutOfBoundsException: writerIndex: 
> --------------------------------------------------
>
>                 Key: DRILL-5557
>                 URL: https://issues.apache.org/jira/browse/DRILL-5557
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: renlu
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message