accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <afu...@apache.org>
Subject Re: Snappy as default table.file.compress.type?
Date Mon, 15 Aug 2016 16:31:56 GMT
We need to consider the scenario in which somebody has written an
application on Accumulo that uses the default compression codec. If we
change the default, their app's behavior will change when they upgrade
Accumulo, either because an existing table will start using snappy or
because their app creates new tables and those new tables will start using
snappy. This change could be disruptive, especially in the case that the
application developers have moved on to other projects and are no longer
available to fix the app. This is why I think you need a stronger argument
than snappy being better on average than gzip. I agree that snappy is
better on average, and even in most cases, but I'm not convinced we should
change the default.

Adam


On Aug 15, 2016 8:14 AM, "Josh Elser" <josh.elser@gmail.com> wrote:

> No, I never asserted that Snappy is *always* the better choice. I would
> say that I believe Snappy is better in *most cases*.
>
> Most users I talk to (with and without Accumulo involved) have plenty of
> disk space available to them. It is rare that space on disk is actually a
> concern. Instead, performance is usually the primary metric of concern. To
> be crystal clear, this is only my opinion on users I've talked to, not an
> assertion on everyone.
>
> I do not believe I need a better argument than "on average, we can make
> out of the box performance better for most users". I suppose we'll have to
> disagree on that point. Thanks for clarifying your opinions on the topic.
>
> Adam Fuchs wrote:
>
>> If the crux of your argument was that snappy is always a better choice,
>> then my retort was to say it is not, since sometimes compression ratio can
>> be a dominant factor. Changes to defaults are disruptive for existing
>> users, so you need a better argument. I don't mean that you shouldn't
>> continue to debate the merits. By all means, do continue the conversation.
>>
>> Adam
>>
>> On Aug 13, 2016 8:39 PM, "Josh Elser"<josh.elser@gmail.com>  wrote:
>>
>>> Your argument fails to address the performance benefits. I could pose the
>>> same question back to you: you need to prove why we shouldn't use the
>>> faster compression algorithm.
>>>
>>> I don't mean to be snarky, but your argument is shutting down
>>>
>> conversation.
>>
>>> I appreciate you sharing the opinion but don't feel like it's encouraging
>>> discussion.
>>>
>>> On Aug 13, 2016 11:18 PM, "Adam Fuchs"<afuchs@apache.org>  wrote:
>>>
>>> In my experience gz gets roughly 1.5x to 2x better compression than
>>>>
>>> snappy.
>>
>>> Snappy is definitely not a pareto improvement (although we tend to use
>>>> snappy by default). Since it's not always better I think you would need
>>>>
>>> a
>>
>>> more solid argument to change the default.
>>>>
>>>> Adam
>>>>
>>>> On Aug 13, 2016 8:06 PM, "Josh Elser"<josh.elser@gmail.com>  wrote:
>>>>
>>>> Same motivation of using it as for making it the default. I am not
>>>>>
>>>> aware
>>
>>> of any downside to it. It's become pretty standard across all
>>>>>
>>>> installations
>>>>
>>>>> I've worked with for years.
>>>>>
>>>>> Asking because I am no oracle on the matter. I could just be ignorant
>>>>>
>>>> of
>>
>>> some issue, but, given my current understanding, there is no downside
>>>>>
>>>> for
>>
>>> the average case.
>>>>>
>>>>> Christopher wrote:
>>>>>
>>>>> Sorry. I wasn't clear. I understand the motivation for using it...
>>>>>>
>>>>> I'm
>>
>>> asking about the motivation for making it the default.
>>>>>>
>>>>>> Since both are available, I'm not sure the default matters *that*
>>>>>>
>>>>> much,
>>
>>> but
>>>>>> it could be an unexpected change for those preferring GZ.
>>>>>>
>>>>>> Also, are there any risks regarding library availability of snappy?
>>>>>>
>>>>> GZ
>>
>>> is
>>>>
>>>>> pretty ubiquitous.
>>>>>>
>>>>>> On Sat, Aug 13, 2016 at 10:59 PM Josh Elser<josh.elser@gmail.com>
>>>>>>
>>>>> wrote:
>>>>
>>>>> Uhh, besides what I already mentioned? (close in compressed size but
>>>>>>
>>>>>>> "much" faster)
>>>>>>>
>>>>>>> Christopher wrote:
>>>>>>>
>>>>>>> What's the motivation for changing it?
>>>>>>>>
>>>>>>>> On Sat, Aug 13, 2016 at 10:47 PM Josh Elser<josh.elser@gmail.com>
>>>>>>>>
>>>>>>>> wrote:
>>>>>>>
>>>>>>> Any reason we don't want to do this? Last rule-of-thumb I heard
was
>>>>>>>>
>>>>>>> that
>>>>
>>>>> snappy is often close enough in compression to GZ but quite a bit
>>>>>>>>> faster
>>>>>>>>> (I don't remember exactly how much).
>>>>>>>>>
>>>>>>>>> - Josh
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message