accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Snappy as default table.file.compress.type?
Date Mon, 15 Aug 2016 17:24:58 GMT
I think I like this idea. The configuration templates we have, along 
with bin/bootstrap_config.sh, would make this easily consumable, IMO.

I'm not sure how many users know about/use that script though. I know 
that I personally still copy out of conf/examples/3GB-native and modify 
to suit my whims at the moment.

Marc P. wrote:
> Perhaps there is a happy medium, though, by not necessarily defining
> example configurations by the size of your memory footprint, but instead by
> performance configuration? Snappy could be the default for those who want a
> faster but less space cognizant implementation. Christopher's concerns
> would be allayed, and perhaps those who try Accumulo may get better
> performance by using Snappy?
>
> On Sat, Aug 13, 2016 at 11:19 PM, Christopher<ctubbsii@apache.org>  wrote:
>
>> Native libraries for snappy are also not typically installed by default on
>> Linux distros. Even if the hadoop native libraries are installed, the user
>> is likely going to end up using the Java implementation by default, I
>> *think*, unless they take additional actions.
>>
>> On Sat, Aug 13, 2016 at 11:18 PM Adam Fuchs<afuchs@apache.org>  wrote:
>>
>>> In my experience gz gets roughly 1.5x to 2x better compression than
>> snappy.
>>> Snappy is definitely not a pareto improvement (although we tend to use
>>> snappy by default). Since it's not always better I think you would need a
>>> more solid argument to change the default.
>>>
>>> Adam
>>>
>>> On Aug 13, 2016 8:06 PM, "Josh Elser"<josh.elser@gmail.com>  wrote:
>>>
>>>> Same motivation of using it as for making it the default. I am not
>> aware
>>>> of any downside to it. It's become pretty standard across all
>>> installations
>>>> I've worked with for years.
>>>>
>>>> Asking because I am no oracle on the matter. I could just be ignorant
>> of
>>>> some issue, but, given my current understanding, there is no downside
>> for
>>>> the average case.
>>>>
>>>> Christopher wrote:
>>>>
>>>>> Sorry. I wasn't clear. I understand the motivation for using it... I'm
>>>>> asking about the motivation for making it the default.
>>>>>
>>>>> Since both are available, I'm not sure the default matters *that*
>> much,
>>>>> but
>>>>> it could be an unexpected change for those preferring GZ.
>>>>>
>>>>> Also, are there any risks regarding library availability of snappy? GZ
>>> is
>>>>> pretty ubiquitous.
>>>>>
>>>>> On Sat, Aug 13, 2016 at 10:59 PM Josh Elser<josh.elser@gmail.com>
>>> wrote:
>>>>> Uhh, besides what I already mentioned? (close in compressed size but
>>>>>> "much" faster)
>>>>>>
>>>>>> Christopher wrote:
>>>>>>
>>>>>>> What's the motivation for changing it?
>>>>>>>
>>>>>>> On Sat, Aug 13, 2016 at 10:47 PM Josh Elser<josh.elser@gmail.com>
>>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> Any reason we don't want to do this? Last rule-of-thumb I heard
was
>>> that
>>>>>>>> snappy is often close enough in compression to GZ but quite
a bit
>>>>>>>> faster
>>>>>>>> (I don't remember exactly how much).
>>>>>>>>
>>>>>>>> - Josh
>>>>>>>>
>>>>>>>>
>

Mime
View raw message