drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Hsuan-Yi Chu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3808) When reading TSV files, TextReader does not follow the standard
Date Sat, 19 Sep 2015 04:30:04 GMT
Sean Hsuan-Yi Chu created DRILL-3808:
----------------------------------------

             Summary: When reading TSV files, TextReader does not follow the standard
                 Key: DRILL-3808
                 URL: https://issues.apache.org/jira/browse/DRILL-3808
             Project: Apache Drill
          Issue Type: Bug
          Components: Storage - Text & CSV
            Reporter: Sean Hsuan-Yi Chu
            Assignee: Sean Hsuan-Yi Chu
            Priority: Critical


According to references [1], [2]:

In .csv, the double quote is a special character as it can optionally enclose a text field.
But in .tsv, it is not a special character, and it can appear anywhere and when it does, it
should treated as a literal. The tsv format specification also does not provide for the tab
or CR/LF characters to show up anywhere in text fields. However, Drill treats tsv very the
same like csv.

For an example, given data:
{code}
"test"\t"test"
{code}
A query: select columns[0], columns[1] from `t.tsv`; Drill would give
{code}
test      test
{code}
However, according to the reference[2], it is supposed to be
{code}
"test"      "test"
{code}

Ideally, the Drill should follow the standard see[2].
[1] CSV - https://tools.ietf.org/html/rfc4180
[2] TSV - http://www.iana.org/assignments/media-types/text/tab-separated-values




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message