Mailing-List: contact dev-help@drill.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@drill.apache.org
Date: Mon, 29 May 2017 21:14:05 +0000 (UTC)
From: "Paul Rogers (JIRA)" <jira@apache.org>
To: dev@drill.apache.org
Message-ID: <JIRA.13075703.1496092432000.316616.1496092445207@Atlassian.JIRA>
In-Reply-To: <JIRA.13075703.1496092432000@Atlassian.JIRA>
References: <JIRA.13075703.1496092432000@Atlassian.JIRA> <JIRA.13075703.1496092432029@jira-lw-us.apache.org>
Subject: [jira] [Created] (DRILL-5548) SELECT * against an empty CSV file
 with headers produces error
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Mon, 29 May 2017 21:14:11 -0000

Paul Rogers created DRILL-5548:
----------------------------------

             Summary: SELECT * against an empty CSV file with headers produces error
                 Key: DRILL-5548
                 URL: https://issues.apache.org/jira/browse/DRILL-5548
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.10.0
            Reporter: Paul Rogers
            Priority: Minor


Drill's CSV column reader supports two forms of files:

* Files with column headers as the first line of the file.
* Files without column headers.

The CSV storage plugin specifies which format to use for files accessed via that storage plugin config.

Suppose we have a empty file. When queried in the CSV configuration without headers, the query works. The schema returned is the {{columns}} Varchar array, and the results contain no rows. Good.

Now, query the same file with the CSV plugin configured to use headers.

{code}
    TextFormatConfig csvFormat = new TextFormatConfig();
    csvFormat.fieldDelimiter = ',';
    csvFormat.skipFirstLine = false;
    csvFormat.extractHeader = true;
{code}

(The above can also be done using JSON when running Drill as a server.)

We get the following exception:

{code}
org.apache.drill.common.exceptions.UserRemoteException: 
SYSTEM ERROR: IllegalStateException: 
Incoming batch [#4, ProjectRecordBatch] has an empty schema. 
This is not allowed.
{code}

This particular case is a bit tricky. First, we want headers, but there are none. We can interpret this as an error (a file with headers must have headers). Or, we an treat it as a file that happens to have no columns. The latter choice is a bit more general.

The file also has no data rows. This could be an error, or it too could just be treated as a result set of zero rows.

Combined, the result set is one with no columns and no rows: an empty result set. This is actually a valid (if not very useful) result in SQL.

Conversation with Jinfeng suggested that, in such a scenario, the reader is supposed to make up a dummy column so that the result is not empty. While this is a workaround, it seems to just push the problem from the Project operator into each of many record readers.


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)