hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: HBase Bulk Load script
Date Wed, 29 Dec 2010 19:51:13 GMT
Also, docs patches welcome :)

On Tue, Dec 28, 2010 at 1:29 AM, Lars George <lars.george@gmail.com> wrote:

> Hi Marc,
>
> Actually, HFileOutputFormat is what you need to target, the below is
> for other file formats and their compression. HFOF has support for
> compressing the data as it is written, so either add this to your
> configuration
>
> conf.set("hfile.compression", "lzo");
>
> or add this to the job startup command
>
> -Dhfile.compression=lzo
>
> (or with another compression codec obviously).
>
> Lars
>
>
> On Tue, Dec 28, 2010 at 2:07 AM, Marc Limotte <mslimotte@gmail.com> wrote:
> > Lars, Todd,
> >
> > Thanks for the info.  If I understand correctly, the importtsv command
> line
> > tool will not compress by default and there is no command line switch for
> > it, but I can modify the source at
> >
> hbase-0.89.20100924+28/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
> > to call FileOutputFormat.setCompressOutput/setOutputCompressorClass() on
> the
> > Job; in order to turn on compression.
> >
> > Does that sound right?
> >
> > Marc
> >
> >
> > On Thu, Dec 23, 2010 at 2:34 PM, Todd Lipcon <todd@cloudera.com> wrote:
> >
> >> You beat me to it, Lars! Was writing a response when some family arrived
> >> for
> >> the holidays, and when I came back, you had written just what I had
> started
> >> :)
> >>
> >> On Thu, Dec 23, 2010 at 1:51 PM, Lars George <lars.george@gmail.com>
> >> wrote:
> >>
> >> > live ones and then moved into place from their temp location. Not sure
> >> > what happens if the local cluster has no /hbase etc.
> >> >
> >> > Todd, could you help here?
> >> >
> >>
> >> Yep, there is a code path where if the HFiles are on a different
> >> filesystem,
> >> it will copy them to the HBase filesystem first. It's not very
> efficient,
> >> though, so it's probably better to distcp them to the local cluster
> first.
> >>
> >> -Todd
> >> --
> >> Todd Lipcon
> >> Software Engineer, Cloudera
> >>
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message