drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sekhon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-1712) Quoted CSV parsing
Date Wed, 17 Dec 2014 10:34:13 GMT

     [ https://issues.apache.org/jira/browse/DRILL-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hari Sekhon updated DRILL-1712:
-------------------------------
    Description: 
When querying CSV files Drill doesn't handle quoted CSV files properly and includes the quotes
in the data. The directory /tmp/hari in MapR-FS has two simple CSV files,  one quoted, one
not quoted so you can see the difference.
{code}
0: jdbc:drill:> select * from dfs.`/tmp/hari` limit 10;
+------------+
|  columns   |
+------------+
| ["1","2","3"] |
| ["4","5","6"] |
| ["7","8","9"] |
| ["\"1\"","\"2\"","\"3\""] |
| ["\"4\"","\"5\"","\"6\""] |
| ["\"7\"","\"8\"","\"9\""] |
+------------+
6 rows selected (0.238 seconds)

 cat hari/hari.csv
1,2,3
4,5,6
7,8,9
cat hari/hari2.csv
"1","2","3"
"4","5","6"
"7","8","9"
{code}
It shouldn't be including the quotes as data, they're just containers to the data.

This is related to DRILL-950 but is not the same issue.

Regards,

Hari Sekhon
http://www.linkedin.com/in/harisekhon

  was:
When querying CSV files Drill doesn't handle quoted CSV files properly and includes the quotes
in the data. The directory /tmp/hari in MapR-FS has two simple CSV files,  one quoted, one
not quoted so you can see the difference.
{code}
0: jdbc:drill:> select * from dfs.`/tmp/hari` limit 10;
+------------+
|  columns   |
+------------+
| ["1","2","3"] |
| ["4","5","6"] |
| ["7","8","9"] |
| ["\"1\"","\"2\"","\"3\""] |
| ["\"4\"","\"5\"","\"6\""] |
| ["\"7\"","\"8\"","\"9\""] |
+------------+
6 rows selected (0.238 seconds)

 cat hari/hari.csv
1,2,3
4,5,6
7,8,9
cat hari/hari2.csv
"1","2","3"
"4","5","6"
"7","8","9"
{code}
It shouldn't be including the quotes as data, they're just containers to the data.

This is related to DRILL-950 but is not the same issue.


> Quoted CSV parsing
> ------------------
>
>                 Key: DRILL-1712
>                 URL: https://issues.apache.org/jira/browse/DRILL-1712
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 0.6.0
>         Environment: MapR 4.0.1 M5
>            Reporter: Hari Sekhon
>
> When querying CSV files Drill doesn't handle quoted CSV files properly and includes the
quotes in the data. The directory /tmp/hari in MapR-FS has two simple CSV files,  one quoted,
one not quoted so you can see the difference.
> {code}
> 0: jdbc:drill:> select * from dfs.`/tmp/hari` limit 10;
> +------------+
> |  columns   |
> +------------+
> | ["1","2","3"] |
> | ["4","5","6"] |
> | ["7","8","9"] |
> | ["\"1\"","\"2\"","\"3\""] |
> | ["\"4\"","\"5\"","\"6\""] |
> | ["\"7\"","\"8\"","\"9\""] |
> +------------+
> 6 rows selected (0.238 seconds)
>  cat hari/hari.csv
> 1,2,3
> 4,5,6
> 7,8,9
> cat hari/hari2.csv
> "1","2","3"
> "4","5","6"
> "7","8","9"
> {code}
> It shouldn't be including the quotes as data, they're just containers to the data.
> This is related to DRILL-950 but is not the same issue.
> Regards,
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message