accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Snappy as default table.file.compress.type?
Date Mon, 15 Aug 2016 15:13:53 GMT
No, I never asserted that Snappy is *always* the better choice. I would 
say that I believe Snappy is better in *most cases*.

Most users I talk to (with and without Accumulo involved) have plenty of 
disk space available to them. It is rare that space on disk is actually 
a concern. Instead, performance is usually the primary metric of 
concern. To be crystal clear, this is only my opinion on users I've 
talked to, not an assertion on everyone.

I do not believe I need a better argument than "on average, we can make 
out of the box performance better for most users". I suppose we'll have 
to disagree on that point. Thanks for clarifying your opinions on the topic.

Adam Fuchs wrote:
> If the crux of your argument was that snappy is always a better choice,
> then my retort was to say it is not, since sometimes compression ratio can
> be a dominant factor. Changes to defaults are disruptive for existing
> users, so you need a better argument. I don't mean that you shouldn't
> continue to debate the merits. By all means, do continue the conversation.
>
> Adam
>
> On Aug 13, 2016 8:39 PM, "Josh Elser"<josh.elser@gmail.com>  wrote:
>> Your argument fails to address the performance benefits. I could pose the
>> same question back to you: you need to prove why we shouldn't use the
>> faster compression algorithm.
>>
>> I don't mean to be snarky, but your argument is shutting down
> conversation.
>> I appreciate you sharing the opinion but don't feel like it's encouraging
>> discussion.
>>
>> On Aug 13, 2016 11:18 PM, "Adam Fuchs"<afuchs@apache.org>  wrote:
>>
>>> In my experience gz gets roughly 1.5x to 2x better compression than
> snappy.
>>> Snappy is definitely not a pareto improvement (although we tend to use
>>> snappy by default). Since it's not always better I think you would need
> a
>>> more solid argument to change the default.
>>>
>>> Adam
>>>
>>> On Aug 13, 2016 8:06 PM, "Josh Elser"<josh.elser@gmail.com>  wrote:
>>>
>>>> Same motivation of using it as for making it the default. I am not
> aware
>>>> of any downside to it. It's become pretty standard across all
>>> installations
>>>> I've worked with for years.
>>>>
>>>> Asking because I am no oracle on the matter. I could just be ignorant
> of
>>>> some issue, but, given my current understanding, there is no downside
> for
>>>> the average case.
>>>>
>>>> Christopher wrote:
>>>>
>>>>> Sorry. I wasn't clear. I understand the motivation for using it...
> I'm
>>>>> asking about the motivation for making it the default.
>>>>>
>>>>> Since both are available, I'm not sure the default matters *that*
> much,
>>>>> but
>>>>> it could be an unexpected change for those preferring GZ.
>>>>>
>>>>> Also, are there any risks regarding library availability of snappy?
> GZ
>>> is
>>>>> pretty ubiquitous.
>>>>>
>>>>> On Sat, Aug 13, 2016 at 10:59 PM Josh Elser<josh.elser@gmail.com>
>>> wrote:
>>>>> Uhh, besides what I already mentioned? (close in compressed size but
>>>>>> "much" faster)
>>>>>>
>>>>>> Christopher wrote:
>>>>>>
>>>>>>> What's the motivation for changing it?
>>>>>>>
>>>>>>> On Sat, Aug 13, 2016 at 10:47 PM Josh Elser<josh.elser@gmail.com>
>>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>> Any reason we don't want to do this? Last rule-of-thumb I heard
was
>>> that
>>>>>>>> snappy is often close enough in compression to GZ but quite
a bit
>>>>>>>> faster
>>>>>>>> (I don't remember exactly how much).
>>>>>>>>
>>>>>>>> - Josh
>>>>>>>>
>>>>>>>>
>

Mime
View raw message