impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Rahn (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (IMPALA-522) Allow shorthand in CREATE TABLE for CSV and TSV
Date Wed, 12 Apr 2017 00:27:41 GMT

     [ https://issues.apache.org/jira/browse/IMPALA-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Greg Rahn resolved IMPALA-522.
------------------------------
    Resolution: Won't Do

> Allow shorthand in CREATE TABLE for CSV and TSV
> -----------------------------------------------
>
>                 Key: IMPALA-522
>                 URL: https://issues.apache.org/jira/browse/IMPALA-522
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 1.1, Impala 2.5.0
>            Reporter: John Russell
>            Priority: Trivial
>
> Currently, the precise way to create a CSV or TSV table is to specify:
> row format delimited fields terminated by ',' stored as textfile
> row format delimited fields terminated by '\t' stored as textfile
> Some parts are optional (STORED AS TEXTFILE), others are redundant (is there any TEXTFILE
that is not ROW FORMAT DELIMITED? could you have FIELDS TERMINATED BY without ROW FORMAT DELIMITED?)
or confusing (could STORED AS or FIELDS TERMINATED BY go ahead of ROW FORMAT?).
> Since the 3 clauses STORED AS TEXTFILE, ROW FORMAT DELIMITED, and FIELDS TERMINATED BY
always go hand-in-hand, how about some shorthand notation for the most common cases, for example:
> STORED AS CSV
> STORED AS TSV
> to allow creating tables for comma-separated and tab-separated data without the verbose
and redundant clauses?
> I did a little research online and all the other common separators do not have well-known
acronyms. Sometimes space is a separator, but that is a recipe for trouble with most kinds
of string data. Pipe is popular, but there is no tradition of a .psv file extension or PSV
acronym.* Colon is sometimes used, but any acronym would conflict with CSV for comma.
> Cf: http://en.wikipedia.org/wiki/Delimiter-separated_values
> * Now that I've had some more experience with choosing delimiter characters, I'm voting
for STORED AS PSV to make a table with pipe-separated values. Sometimes you need a character
that's visible, so tab won't do. So pipe is the logical fallback choice if the data contains
any commas.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message