hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yongqiang he <heyongqiang...@gmail.com>
Subject Re: Can compression be used with ColumnarSerDe ?
Date Tue, 25 Jan 2011 00:51:53 GMT
Yes. It only support block compression. (No record level compression support.)
You can use the config 'hive.io.rcfile.record.buffer.size' to specify
the block size (before compression). The default is 4MB.

Thanks
Yongqiang
On Mon, Jan 24, 2011 at 4:44 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
> On Mon, Jan 24, 2011 at 4:42 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
>> On Mon, Jan 24, 2011 at 4:14 PM, yongqiang he <heyongqiangict@gmail.com> wrote:
>>> How did you upload the data to the new table?
>>> You can get the data compressed by doing a insert overwrite to the
>>> destination table with setting "hive.exec.compress.output" to true.
>>>
>>> Thanks
>>> Yongqiang
>>> On Mon, Jan 24, 2011 at 12:30 PM, Edward Capriolo <edlinuxguru@gmail.com>
wrote:
>>>> I am trying to explore some use case that I believe are perfect for
>>>> the columnarSerDe, tables with 100+ columns where only one or two are
>>>> selected in a particular query.
>>>>
>>>> CREATE TABLE (....)
>>>> ROW FORMAT SERDE "org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe"
>>>>   STORED AS RCFile ;
>>>>
>>>> My issue is my data from our source table, with gzip sequence files,
>>>> is much smaller then the ColumnarSerDe table and as a result any
>>>> performance gains are lost.
>>>>
>>>> Any ideas?
>>>>
>>>> Thank you,
>>>> Edward
>>>>
>>>
>>
>> Thank you! That was a RTFM question.
>>
>>  set hive.exec.dynamic.partition.mode=nonstrict;
>> set hive.exec.compress.output=true;
>> set mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec;
>>
>> I was unclear about  'STORED AS RCFile' since normally you would need
>> to use ' STORED AS SEQUENCEFILE'
>>
>> However http://hive.apache.org/docs/r0.6.0/api/org/apache/hadoop/hive/ql/io/RCFile.html
>> explains this well. RCFILE is a special type of sequence file.
>>
>> I did get it working. Looks good compression for my table was smaller
>> then using GZIP BLOCK Sequence file. Query time was slightly better in
>> limited testing. Cool stuff.
>>
>> Edward
>>
>
> Do rcfiles support a blocksize for compression like other compressed
> sequence files?
>

Mime
View raw message