accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Snappy as default table.file.compress.type?
Date Mon, 15 Aug 2016 15:15:44 GMT
Ok, understood. Such a change would certainly require mention in release 
notes, user manual, etc.

Christopher wrote:
> Yes, it's a simple matter to install the dependency... it just might not be
> installed by default. I'd certainly recommend downstream vendors/packagers
> add it as a required or suggested dependency to their RPMs/DEBs/etc.,
> though.
>
> The snappy package on RHEL/CentOS provides libsnappy. The
> org.xerial.snappy:snappy-java dependency provides JNI support, but it looks
> like Hadoop doesn't use that and instead uses its own JNI stuffs. Neither
> seem to provide a non-native implementation, as far as I can tell. So, I
> guess I was wrong about that. You definitely need the native library
> installed for it to work at all.
>
> On Sat, Aug 13, 2016 at 11:42 PM Josh Elser<josh.elser@gmail.com>  wrote:
>
>> That's a fair point. I'm off in nebulous vendor land and tend to be removed
>> from pure Apache Hadoop artifacts. I feel like there's a snappy package (at
>> least on centos) which is enough, but understanding this would be good.
>>
>> Is there a nonnative snappy impl?
>>
>> On Aug 13, 2016 11:19 PM, "Christopher"<ctubbsii@apache.org>  wrote:
>>
>>> Native libraries for snappy are also not typically installed by default
>> on
>>> Linux distros. Even if the hadoop native libraries are installed, the
>> user
>>> is likely going to end up using the Java implementation by default, I
>>> *think*, unless they take additional actions.
>>>
>>> On Sat, Aug 13, 2016 at 11:18 PM Adam Fuchs<afuchs@apache.org>  wrote:
>>>
>>>> In my experience gz gets roughly 1.5x to 2x better compression than
>>> snappy.
>>>> Snappy is definitely not a pareto improvement (although we tend to use
>>>> snappy by default). Since it's not always better I think you would
>> need a
>>>> more solid argument to change the default.
>>>>
>>>> Adam
>>>>
>>>> On Aug 13, 2016 8:06 PM, "Josh Elser"<josh.elser@gmail.com>  wrote:
>>>>
>>>>> Same motivation of using it as for making it the default. I am not
>>> aware
>>>>> of any downside to it. It's become pretty standard across all
>>>> installations
>>>>> I've worked with for years.
>>>>>
>>>>> Asking because I am no oracle on the matter. I could just be ignorant
>>> of
>>>>> some issue, but, given my current understanding, there is no downside
>>> for
>>>>> the average case.
>>>>>
>>>>> Christopher wrote:
>>>>>
>>>>>> Sorry. I wasn't clear. I understand the motivation for using it...
>> I'm
>>>>>> asking about the motivation for making it the default.
>>>>>>
>>>>>> Since both are available, I'm not sure the default matters *that*
>>> much,
>>>>>> but
>>>>>> it could be an unexpected change for those preferring GZ.
>>>>>>
>>>>>> Also, are there any risks regarding library availability of snappy?
>> GZ
>>>> is
>>>>>> pretty ubiquitous.
>>>>>>
>>>>>> On Sat, Aug 13, 2016 at 10:59 PM Josh Elser<josh.elser@gmail.com>
>>>> wrote:
>>>>>> Uhh, besides what I already mentioned? (close in compressed size
but
>>>>>>> "much" faster)
>>>>>>>
>>>>>>> Christopher wrote:
>>>>>>>
>>>>>>>> What's the motivation for changing it?
>>>>>>>>
>>>>>>>> On Sat, Aug 13, 2016 at 10:47 PM Josh Elser<josh.elser@gmail.com>
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Any reason we don't want to do this? Last rule-of-thumb I
heard
>> was
>>>> that
>>>>>>>>> snappy is often close enough in compression to GZ but
quite a bit
>>>>>>>>> faster
>>>>>>>>> (I don't remember exactly how much).
>>>>>>>>>
>>>>>>>>> - Josh
>>>>>>>>>
>>>>>>>>>
>

Mime
View raw message