hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Enabling LZO compression of map outputs in Cloudera Hadoop 0.20.1
Date Mon, 09 Aug 2010 02:34:10 GMT
On Sat, Aug 7, 2010 at 9:18 PM, Alex Luya <alexander.luya@gmail.com> wrote:

> Does it(hadoop-lzo) only work for hadoop 0.20,not work for 0.21 or 0.22?
>

I don't know that anyone has tested it against 0.21 or trunk, but I don't
see any reasons it won't work just fine  -- the APIs are pretty stable
between 0.20 and above.

-Todd


> On Friday, August 06, 2010 09:05:47 am Todd Lipcon wrote:
> > On Thu, Aug 5, 2010 at 4:52 PM, Bobby Dennett <bdennett@gmail.com>
> wrote:
> > > Hi Josh,
> > >
> > > No real pain points... just trying to investigate/research the "best"
> > > way to create the necessary libraries and jar files to support LZO
> > > compression in Hadoop. In particular, there are the 2 "repositories"
> > > to build from and I am trying to find out if one should be used over
> > > the other. For instance, in your previous posting, you refer to
> > > hadoop-gpl-compression while the Twitter blog post from last year
> > > mentions the Hadoop-LZO project. Briefly looking, it seems Hadoop-LZO
> > > is preferable but we're curious if there are any caveats/gotchas we
> > > should be aware of.
> >
> > Yes, definitely use the hadoop-lzo project from github -- either from my
> > repo or from kevinweil's (the two are kept in sync)
> >
> > The repo on Google Code has a number of known bugs, which is why we
> forked
> > it over to github last year.
> >
> > -Todd
> >
> > On Thu, Aug 5, 2010 at 15:59, Josh Patterson <josh@cloudera.com> wrote:
> > > > Bobby,
> > > >
> > > > We're working hard to make compression easier, the biggest hurdle
> > > > currently is the licensing issues around the LZO codec libs (GPL,
> > > > which is not compatible with ASF bsd-style license).
> > > >
> > > > Outside of making the changes to the mapred-site.xml file, with your
> > > > setup would do you view as the biggest pain point?
> > > >
> > > > Josh Patterson
> > > > Cloudera
> > > >
> > > > On Thu, Aug 5, 2010 at 6:52 PM, Bobby Dennett
> > > >
> > > > <bdennett+software@gmail.com <bdennett%2Bsoftware@gmail.com>
<
> bdennett%2Bsoftware@gmail.com <bdennett%252Bsoftware@gmail.com>>> wrote:
> > > >> We are looking to enable LZO compression of the map outputs on our
> > > >> Cloudera 0.20.1 cluster. It seems there are various sets of
> > > >> instructions available and I am curious what your thoughts are
> > > >> regarding which one would be best for our Hadoop distribution and
OS
> > > >> (Ubuntu 8.04 64-bit). In particular, hadoop-gpl-compression
> > > >> (http://code.google.com/p/hadoop-gpl-compression) vs. hadoop-lzo
> > > >> (http://github.com/kevinweil/hadoop-lzo).
> > > >>
> > > >> Some of what appear to be the better instructions/guides out there:
> > > >> * Josh Patterson's reply on June 25th to the "Newbie to HDFS
> > > >> compression" thread --
> > >
> > >
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201006.mbox/%
> > > 3CAANLkTileo-q8USEiP8Y3Na9pDYHlyUFIPpR0In0LkpJm@mail.gmail.com%3E
> > >
> > > >> * hadoop-gpl-compression FAQ --
> > > >> http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ
> > > >> * "Hadoop at Twitter (part 1): Splittable LZO Compression" blog post
> > > >> --
> > >
> > >
> http://www.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-
> > > lzo-compression/
> > >
> > > >> Thanks in advance,
> > > >> -Bobby
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message