drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sekhon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3529) Drill proper DESCRIBE support for CSV
Date Tue, 21 Jul 2015 13:25:04 GMT
Hari Sekhon created DRILL-3529:

             Summary: Drill proper DESCRIBE support for CSV
                 Key: DRILL-3529
                 URL: https://issues.apache.org/jira/browse/DRILL-3529
             Project: Apache Drill
          Issue Type: Bug
          Components: Metadata, Storage - Text & CSV
    Affects Versions: 1.1.0
            Reporter: Hari Sekhon
            Assignee: Steven Phillips

Request to add full DESCRIBE support for CSV files.

Currently the describe command results in a blank table being printed instead of the CSV header
/ schema.

This is dependent on DRILL-624 actually reading the header line as the schema of the CSV.

After DRILL-624 is completed, I propose the following solution:

When dealing with a directory with multiple CSV files it would might make sense to read N
number of CSV file headers by default. Extend the DESCRIBE command to have a user-configurable
number of CSV file headers be read and presented, as well as an ALL keywords to scan all CSV
file headers to be able to authoritatively print the schema of all the data.

It might also make sense to read the newest and oldest CSV files by timestamp or by file name
formatting conventions to pick up the different headers.

This message was sent by Atlassian JIRA

View raw message