accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <>
Subject Re: Snappy as default table.file.compress.type?
Date Mon, 15 Aug 2016 17:24:58 GMT
I think I like this idea. The configuration templates we have, along 
with bin/, would make this easily consumable, IMO.

I'm not sure how many users know about/use that script though. I know 
that I personally still copy out of conf/examples/3GB-native and modify 
to suit my whims at the moment.

Marc P. wrote:
> Perhaps there is a happy medium, though, by not necessarily defining
> example configurations by the size of your memory footprint, but instead by
> performance configuration? Snappy could be the default for those who want a
> faster but less space cognizant implementation. Christopher's concerns
> would be allayed, and perhaps those who try Accumulo may get better
> performance by using Snappy?
> On Sat, Aug 13, 2016 at 11:19 PM, Christopher<>  wrote:
>> Native libraries for snappy are also not typically installed by default on
>> Linux distros. Even if the hadoop native libraries are installed, the user
>> is likely going to end up using the Java implementation by default, I
>> *think*, unless they take additional actions.
>> On Sat, Aug 13, 2016 at 11:18 PM Adam Fuchs<>  wrote:
>>> In my experience gz gets roughly 1.5x to 2x better compression than
>> snappy.
>>> Snappy is definitely not a pareto improvement (although we tend to use
>>> snappy by default). Since it's not always better I think you would need a
>>> more solid argument to change the default.
>>> Adam
>>> On Aug 13, 2016 8:06 PM, "Josh Elser"<>  wrote:
>>>> Same motivation of using it as for making it the default. I am not
>> aware
>>>> of any downside to it. It's become pretty standard across all
>>> installations
>>>> I've worked with for years.
>>>> Asking because I am no oracle on the matter. I could just be ignorant
>> of
>>>> some issue, but, given my current understanding, there is no downside
>> for
>>>> the average case.
>>>> Christopher wrote:
>>>>> Sorry. I wasn't clear. I understand the motivation for using it... I'm
>>>>> asking about the motivation for making it the default.
>>>>> Since both are available, I'm not sure the default matters *that*
>> much,
>>>>> but
>>>>> it could be an unexpected change for those preferring GZ.
>>>>> Also, are there any risks regarding library availability of snappy? GZ
>>> is
>>>>> pretty ubiquitous.
>>>>> On Sat, Aug 13, 2016 at 10:59 PM Josh Elser<>
>>> wrote:
>>>>> Uhh, besides what I already mentioned? (close in compressed size but
>>>>>> "much" faster)
>>>>>> Christopher wrote:
>>>>>>> What's the motivation for changing it?
>>>>>>> On Sat, Aug 13, 2016 at 10:47 PM Josh Elser<>
>>>>>> wrote:
>>>>>>> Any reason we don't want to do this? Last rule-of-thumb I heard
>>> that
>>>>>>>> snappy is often close enough in compression to GZ but quite
a bit
>>>>>>>> faster
>>>>>>>> (I don't remember exactly how much).
>>>>>>>> - Josh

View raw message