drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5548) SELECT * against an empty CSV file with headers produces error
Date Mon, 29 May 2017 21:14:05 GMT
Paul Rogers created DRILL-5548:
----------------------------------

             Summary: SELECT * against an empty CSV file with headers produces error
                 Key: DRILL-5548
                 URL: https://issues.apache.org/jira/browse/DRILL-5548
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.10.0
            Reporter: Paul Rogers
            Priority: Minor


Drill's CSV column reader supports two forms of files:

* Files with column headers as the first line of the file.
* Files without column headers.

The CSV storage plugin specifies which format to use for files accessed via that storage plugin
config.

Suppose we have a empty file. When queried in the CSV configuration without headers, the query
works. The schema returned is the {{columns}} Varchar array, and the results contain no rows.
Good.

Now, query the same file with the CSV plugin configured to use headers.

{code}
    TextFormatConfig csvFormat = new TextFormatConfig();
    csvFormat.fieldDelimiter = ',';
    csvFormat.skipFirstLine = false;
    csvFormat.extractHeader = true;
{code}

(The above can also be done using JSON when running Drill as a server.)

We get the following exception:

{code}
org.apache.drill.common.exceptions.UserRemoteException: 
SYSTEM ERROR: IllegalStateException: 
Incoming batch [#4, ProjectRecordBatch] has an empty schema. 
This is not allowed.
{code}

This particular case is a bit tricky. First, we want headers, but there are none. We can interpret
this as an error (a file with headers must have headers). Or, we an treat it as a file that
happens to have no columns. The latter choice is a bit more general.

The file also has no data rows. This could be an error, or it too could just be treated as
a result set of zero rows.

Combined, the result set is one with no columns and no rows: an empty result set. This is
actually a valid (if not very useful) result in SQL.

Conversation with Jinfeng suggested that, in such a scenario, the reader is supposed to make
up a dummy column so that the result is not empty. While this is a workaround, it seems to
just push the problem from the Project operator into each of many record readers.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message