accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: Snappy as default table.file.compress.type?
Date Sun, 14 Aug 2016 05:19:17 GMT
Yes, it's a simple matter to install the dependency... it just might not be
installed by default. I'd certainly recommend downstream vendors/packagers
add it as a required or suggested dependency to their RPMs/DEBs/etc.,
though.

The snappy package on RHEL/CentOS provides libsnappy. The
org.xerial.snappy:snappy-java dependency provides JNI support, but it looks
like Hadoop doesn't use that and instead uses its own JNI stuffs. Neither
seem to provide a non-native implementation, as far as I can tell. So, I
guess I was wrong about that. You definitely need the native library
installed for it to work at all.

On Sat, Aug 13, 2016 at 11:42 PM Josh Elser <josh.elser@gmail.com> wrote:

> That's a fair point. I'm off in nebulous vendor land and tend to be removed
> from pure Apache Hadoop artifacts. I feel like there's a snappy package (at
> least on centos) which is enough, but understanding this would be good.
>
> Is there a nonnative snappy impl?
>
> On Aug 13, 2016 11:19 PM, "Christopher" <ctubbsii@apache.org> wrote:
>
> > Native libraries for snappy are also not typically installed by default
> on
> > Linux distros. Even if the hadoop native libraries are installed, the
> user
> > is likely going to end up using the Java implementation by default, I
> > *think*, unless they take additional actions.
> >
> > On Sat, Aug 13, 2016 at 11:18 PM Adam Fuchs <afuchs@apache.org> wrote:
> >
> > > In my experience gz gets roughly 1.5x to 2x better compression than
> > snappy.
> > > Snappy is definitely not a pareto improvement (although we tend to use
> > > snappy by default). Since it's not always better I think you would
> need a
> > > more solid argument to change the default.
> > >
> > > Adam
> > >
> > > On Aug 13, 2016 8:06 PM, "Josh Elser" <josh.elser@gmail.com> wrote:
> > >
> > > > Same motivation of using it as for making it the default. I am not
> > aware
> > > > of any downside to it. It's become pretty standard across all
> > > installations
> > > > I've worked with for years.
> > > >
> > > > Asking because I am no oracle on the matter. I could just be ignorant
> > of
> > > > some issue, but, given my current understanding, there is no downside
> > for
> > > > the average case.
> > > >
> > > > Christopher wrote:
> > > >
> > > >> Sorry. I wasn't clear. I understand the motivation for using it...
> I'm
> > > >> asking about the motivation for making it the default.
> > > >>
> > > >> Since both are available, I'm not sure the default matters *that*
> > much,
> > > >> but
> > > >> it could be an unexpected change for those preferring GZ.
> > > >>
> > > >> Also, are there any risks regarding library availability of snappy?
> GZ
> > > is
> > > >> pretty ubiquitous.
> > > >>
> > > >> On Sat, Aug 13, 2016 at 10:59 PM Josh Elser<josh.elser@gmail.com>
> > > wrote:
> > > >>
> > > >> Uhh, besides what I already mentioned? (close in compressed size but
> > > >>> "much" faster)
> > > >>>
> > > >>> Christopher wrote:
> > > >>>
> > > >>>> What's the motivation for changing it?
> > > >>>>
> > > >>>> On Sat, Aug 13, 2016 at 10:47 PM Josh Elser<josh.elser@gmail.com>
> > > >>>>
> > > >>> wrote:
> > > >>>
> > > >>>> Any reason we don't want to do this? Last rule-of-thumb I
heard
> was
> > > that
> > > >>>>> snappy is often close enough in compression to GZ but
quite a bit
> > > >>>>> faster
> > > >>>>> (I don't remember exactly how much).
> > > >>>>>
> > > >>>>> - Josh
> > > >>>>>
> > > >>>>>
> > > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message