hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wd ...@wdicc.com>
Subject Re: Load gzip files into hive
Date Thu, 28 Apr 2011 10:34:54 GMT
Thanks for your help

2011/4/28 Loren Siebert <loren@siebert.org>

> You have the file type as sequence file, but you are trying to load a GZip
> file. Won’t that only work if the table is defined as a text file?
>
I've think sequence = gzip file before, and now I realized it's not.
It's work when table is defined as text file.


>
> Hive isn’t doing anything on your behalf when you do LOAD DATA. It’s
> syntactic sugar for copying a file into a HDFS location. From there, if you
> want a RCFile table or a sequence file table or whatever, you can select
> from the raw_logs table into the new table (e.g., raw_logs_rcfile) that you
> have defined in the different format.
>
> So, this is the only way I can put data into a table defined as sequence
file? Can I generate the RCFile use a unix command or some tools ?


>
> On Apr 27, 2011, at 9:33 PM, wd wrote:
>
> hi,
>
> I've tried to load gzip files into hive to save disk space, but failed.
>
> hive> load data local inpath 'tmp_b.20110426.gz' into table raw_logs
> partition ( dt=20110426 );
> Copying data from file:/home/wd/t/tmp_b.20110426.gz
> Copying file: file:/home/wd/t/tmp_b.20110426.gz
> Loading data to table default.raw_logs partition (dt=20110426)
> Failed with exception Wrong file format. Please check the file's format.
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MoveTask
>
> The raw_logs table is created by:
> create table raw_logs ( ............)  partitioned by ( dt int ) STORED AS
> SEQUENCEFILE;
>
> Is there something wrong? The error is same both in hive 0.5 and 0.7.
>
>
>

Mime
View raw message