hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Pivovarov <apivova...@gmail.com>
Subject Re: how to load data
Date Wed, 29 Apr 2015 22:43:17 GMT
1. Create external textfile hive table pointing to /extract/DBCLOC and
specify CSVSerde

if using hive-0.14 and newer use this
https://cwiki.apache.org/confluence/display/Hive/CSV+Serde
if hive-0.13 and older use https://github.com/ogrodnek/csv-serde

You do not even need to unzgip the file. hive automatically unzgip data on
select.

2. run simple query to load data
insert overwrite table <orc_table>
select * from <csv_table>

On Wed, Apr 29, 2015 at 3:26 PM, Kumar Jayapal <kjayapal17@gmail.com> wrote:

> Hello All,
>
>
> I have this table
>
>
> CREATE  TABLE DBCLOC(
>    BLwhse int COMMENT 'DECIMAL(5,0) Whse',
>    BLsdat string COMMENT 'DATE Sales Date',
>    BLreg_num smallint COMMENT 'DECIMAL(3,0) Reg#',
>    BLtrn_num int COMMENT 'DECIMAL(5,0) Trn#',
>    BLscnr string COMMENT 'CHAR(1) Scenario',
>    BLareq string COMMENT 'CHAR(1) Act Requested',
>    BLatak string COMMENT 'CHAR(1) Act Taken',
>    BLmsgc string COMMENT 'CHAR(3) Msg Code')
> PARTITIONED BY (FSCAL_YEAR  smallint, FSCAL_PERIOD smallint)
> STORED AS PARQUET;
>
> have to load from hdfs location  /extract/DBCLOC/DBCL0301P.csv.gz to the
> table above
>
>
> Can any one tell me what is the most efficient way of doing it.
>
>
> Thanks
> Jay
>

Mime
View raw message