drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-6176) Drill skips a row when querying a text file but does not report it.
Date Thu, 22 Feb 2018 05:20:01 GMT

    [ https://issues.apache.org/jira/browse/DRILL-6176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16372425#comment-16372425
] 

Paul Rogers commented on DRILL-6176:
------------------------------------

The code works as designed and configured.

You are running afoul of a feature of CSV. Lines starting with the '#' character are treated
as comments and ignored. The sixth line of the file:
{noformat}
#@$%@#$%@#$%#%@#$%#^@%^$&%&*^%&*^#%@$%...
{noformat}
Start the line with any other character and the line won't be ignored.

The # character is used in some CSV-like files such as Microsoft IIS access logs.

There is another JIRA for this issue. I thought we allowed setting the comment to 0 to disable
the feature, but the fix is not in the code. So, maybe the fix was never done.

The text format plugin defines the following property:
{code:java}
    public char comment = '#';
{code}
In your format plugin config for the ".tbl" suffix, change the comment character to be something
not in your file. Not pretty, but you can try backspace, which should never occur: `\b`.

> Drill skips a row when querying a text file but does not report it.
> -------------------------------------------------------------------
>
>                 Key: DRILL-6176
>                 URL: https://issues.apache.org/jira/browse/DRILL-6176
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.12.0
>            Reporter: Robert Hou
>            Assignee: Pritesh Maker
>            Priority: Critical
>         Attachments: 10.tbl
>
>
> I tried to query 10 rows from a tbl file.  It skipped the 6th row, which only has special
symbols in it.  So it shows 9 rows.  And there was no warning that a row is skipped.
> i checked the special symbols.  The same symbols appear in other rows.
> This also occurs if the file is a csv file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message