drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khurram Faraaz (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-5550) SELECT non-existent column produces empty required VARCHAR
Date Thu, 06 Jul 2017 07:02:01 GMT

     [ https://issues.apache.org/jira/browse/DRILL-5550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Khurram Faraaz updated DRILL-5550:
----------------------------------
    Component/s: Storage - Text & CSV

> SELECT non-existent column produces empty required VARCHAR
> ----------------------------------------------------------
>
>                 Key: DRILL-5550
>                 URL: https://issues.apache.org/jira/browse/DRILL-5550
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Text & CSV
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> Drill's CSV column reader supports two forms of files:
> * Files with column headers as the first line of the file.
> * Files without column headers.
> The CSV storage plugin specifies which format to use for files accessed via that storage
plugin config.
> Suppose we have a CSV file with headers:
> {code}
> a,b,c
> 10,foo,bar
> {code}
> Suppose we configure a storage plugin to use headers:
> {code}
>     TextFormatConfig csvFormat = new TextFormatConfig();
>     csvFormat.fieldDelimiter = ',';
>     csvFormat.skipFirstLine = false;
>     csvFormat.extractHeader = true;
> {code}
> (The above can also be done using JSON when running Drill as a server.)
> Execute the following query:
> {code}
> SELECT a, c, d FROM `dfs.data.example.csv`
> {code}
> Results:
> {code}
> a,c,d
> 10,bar,
> {code}
> The actual type of column {{d}} is non-nullable VARCHAR.
> This is inconsistent with other parts of Drill in two ways, one may be a bug. Most other
parts of Drill use a nullable INT for "missing" columns.
> 1. For CSV it makes sense for the data type to be VARCHAR, since all CSV columns are
of that type.
> 2. It may *not* make sense for the column to be non-nullable and blank rather than nullable
and NULL. In SQL, NULL means that the data is unknown, which is the case here.
> In the future, we may want to use some other indication for a missing column. Until then,
the requested change is to make the type of a missing CSV column a nullable VARCHAR set to
value NULL.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message