hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject Re: HBase Bulk Load script
Date Tue, 28 Dec 2010 09:29:34 GMT
Hi Marc,

Actually, HFileOutputFormat is what you need to target, the below is
for other file formats and their compression. HFOF has support for
compressing the data as it is written, so either add this to your
configuration

conf.set("hfile.compression", "lzo");

or add this to the job startup command

-Dhfile.compression=lzo

(or with another compression codec obviously).

Lars


On Tue, Dec 28, 2010 at 2:07 AM, Marc Limotte <mslimotte@gmail.com> wrote:
> Lars, Todd,
>
> Thanks for the info.  If I understand correctly, the importtsv command line
> tool will not compress by default and there is no command line switch for
> it, but I can modify the source at
> hbase-0.89.20100924+28/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
> to call FileOutputFormat.setCompressOutput/setOutputCompressorClass() on the
> Job; in order to turn on compression.
>
> Does that sound right?
>
> Marc
>
>
> On Thu, Dec 23, 2010 at 2:34 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
>> You beat me to it, Lars! Was writing a response when some family arrived
>> for
>> the holidays, and when I came back, you had written just what I had started
>> :)
>>
>> On Thu, Dec 23, 2010 at 1:51 PM, Lars George <lars.george@gmail.com>
>> wrote:
>>
>> > live ones and then moved into place from their temp location. Not sure
>> > what happens if the local cluster has no /hbase etc.
>> >
>> > Todd, could you help here?
>> >
>>
>> Yep, there is a code path where if the HFiles are on a different
>> filesystem,
>> it will copy them to the HBase filesystem first. It's not very efficient,
>> though, so it's probably better to distcp them to the local cluster first.
>>
>> -Todd
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>

Mime
View raw message