hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-3682) when output hive table to file,users should could have a separator of their own choice
Date Tue, 16 Apr 2013 07:55:17 GMT

     [ https://issues.apache.org/jira/browse/HIVE-3682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Phabricator updated HIVE-3682:
------------------------------

    Attachment: HIVE-3682.D10275.1.patch

khorgath requested code review of "HIVE-3682 [jira] when output hive table to file,users should
could have a separator of their own choice".

Reviewers: JIRA

HIVE-3682 Supporting custom INSERT OVERWRITE LOCAL DIRECTORY syntax with SerDe and Outputformat
support

By default,when output hive table to file ,columns of the Hive table are separated by ^A character
(that is \001).
But indeed users should have the right to set a seperator of their own choice.

In addition, we need to be able to support custom serde specification to output(such as an
available json serde),
or we need to be able to specify an output format like a 'stored as rcfile' specification
to allow cases
where we want to export data that is meant to be copied into dfs elsewhere and directly read
as an external table.

Usage Example:
create table for_test (key string, value string);
load data local inpath './in1.txt' into table for_test
select * from for_test;
UT-01:default separator is \001 line separator is \n
insert overwrite local directory './test-01'
select * from src ;

create table array_table (a array<string>, b array<string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ',';

load data local inpath "../hive/examples/files/arraytest.txt" overwrite into table table2;

CREATE TABLE map_table (foo STRING , bar MAP<STRING, STRING>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
STORED AS TEXTFILE;

UT-02:defined field separator as ':'
insert overwrite local directory './test-02'
row format delimited
FIELDS TERMINATED BY ':'
select * from src ;

UT-03: line separator DO NOT ALLOWED to define as other separator
insert overwrite local directory './test-03'
row format delimited
FIELDS TERMINATED BY ':'
select * from src ;

UT-04: define map separators
insert overwrite local directory './test-04'
row format delimited
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ':'
select * from src;

UT-05: STORED-AS specification
insert overwrite local directory './test-05'
stored as rcfile
select * from src;

UT-06: custom SerDe specification for output
insert overwrite local directory './test-06'
row format 'org.apache.hadoop.hive.serde2.DelimitedJSONSerDe'
stored as textfile
select * from src;

TEST PLAN
  Included .q files

REVISION DETAIL
  https://reviews.facebook.net/D10275

AFFECTED FILES
  data/files/array_table.txt
  data/files/map_table.txt
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g
  ql/src/java/org/apache/hadoop/hive/ql/parse/QB.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/LocalDirectoryDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java
  ql/src/test/queries/clientpositive/insert_overwrite_local_directory_1.q
  ql/src/test/results/clientpositive/insert_overwrite_local_directory_1.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/24573/

To: JIRA, khorgath

                
> when output hive table to file,users should could have a separator of their own choice
> --------------------------------------------------------------------------------------
>
>                 Key: HIVE-3682
>                 URL: https://issues.apache.org/jira/browse/HIVE-3682
>             Project: Hive
>          Issue Type: New Feature
>          Components: CLI
>    Affects Versions: 0.8.1
>         Environment: Linux 3.0.0-14-generic #23-Ubuntu SMP Mon Nov 21 20:34:47 UTC 2011
i686 i686 i386 GNU/Linux
> java version "1.6.0_25"
> hadoop-0.20.2-cdh3u0
> hive-0.8.1
>            Reporter: caofangkun
>            Assignee: Gang Tim Liu
>         Attachments: HIVE-3682-1.patch, HIVE-3682.D10275.1.patch, HIVE-3682.with.serde.patch
>
>
> By default,when output hive table to file ,columns of the Hive table are separated by
^A character (that is \001).
> But indeed users should have the right to set a seperator of their own choice.
> Usage Example:
> create table for_test (key string, value string);
> load data local inpath './in1.txt' into table for_test
> select * from for_test;
> UT-01:default separator is \001 line separator is \n
> insert overwrite local directory './test-01' 
> select * from src ;
> create table array_table (a array<string>, b array<string>)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ',';
> load data local inpath "../hive/examples/files/arraytest.txt" overwrite into table table2;
> CREATE TABLE map_table (foo STRING , bar MAP<STRING, STRING>)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ','
> MAP KEYS TERMINATED BY ':'
> STORED AS TEXTFILE;
> UT-02:defined field separator as ':'
> insert overwrite local directory './test-02' 
> row format delimited 
> FIELDS TERMINATED BY ':' 
> select * from src ;
> UT-03: line separator DO NOT ALLOWED to define as other separator 
> insert overwrite local directory './test-03' 
> row format delimited 
> FIELDS TERMINATED BY ':' 
> select * from src ;
> UT-04: define map separators 
> insert overwrite local directory './test-04' 
> row format delimited 
> FIELDS TERMINATED BY '\t'
> COLLECTION ITEMS TERMINATED BY ','
> MAP KEYS TERMINATED BY ':'
> select * from src;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message