accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Snappy as default table.file.compress.type?
Date Mon, 15 Aug 2016 17:18:09 GMT
I agree it would be disruptive in the case that you outlined. This is 
why we have release notes and semver, though.

I think this change should only go into a major release for downstream 
stability. Even though how Accumulo creates and manages files is not 
covered by our compatibility statement (it could), I don't feel like 
it's worth trying to shoe-horn such a change into a minor release.

Adam Fuchs wrote:
> We need to consider the scenario in which somebody has written an
> application on Accumulo that uses the default compression codec. If we
> change the default, their app's behavior will change when they upgrade
> Accumulo, either because an existing table will start using snappy or
> because their app creates new tables and those new tables will start using
> snappy. This change could be disruptive, especially in the case that the
> application developers have moved on to other projects and are no longer
> available to fix the app. This is why I think you need a stronger argument
> than snappy being better on average than gzip. I agree that snappy is
> better on average, and even in most cases, but I'm not convinced we should
> change the default.
>
> Adam
>
>
> On Aug 15, 2016 8:14 AM, "Josh Elser"<josh.elser@gmail.com>  wrote:
>
>> No, I never asserted that Snappy is *always* the better choice. I would
>> say that I believe Snappy is better in *most cases*.
>>
>> Most users I talk to (with and without Accumulo involved) have plenty of
>> disk space available to them. It is rare that space on disk is actually a
>> concern. Instead, performance is usually the primary metric of concern. To
>> be crystal clear, this is only my opinion on users I've talked to, not an
>> assertion on everyone.
>>
>> I do not believe I need a better argument than "on average, we can make
>> out of the box performance better for most users". I suppose we'll have to
>> disagree on that point. Thanks for clarifying your opinions on the topic.
>>
>> Adam Fuchs wrote:
>>
>>> If the crux of your argument was that snappy is always a better choice,
>>> then my retort was to say it is not, since sometimes compression ratio can
>>> be a dominant factor. Changes to defaults are disruptive for existing
>>> users, so you need a better argument. I don't mean that you shouldn't
>>> continue to debate the merits. By all means, do continue the conversation.
>>>
>>> Adam
>>>
>>> On Aug 13, 2016 8:39 PM, "Josh Elser"<josh.elser@gmail.com>   wrote:
>>>
>>>> Your argument fails to address the performance benefits. I could pose the
>>>> same question back to you: you need to prove why we shouldn't use the
>>>> faster compression algorithm.
>>>>
>>>> I don't mean to be snarky, but your argument is shutting down
>>>>
>>> conversation.
>>>
>>>> I appreciate you sharing the opinion but don't feel like it's encouraging
>>>> discussion.
>>>>
>>>> On Aug 13, 2016 11:18 PM, "Adam Fuchs"<afuchs@apache.org>   wrote:
>>>>
>>>> In my experience gz gets roughly 1.5x to 2x better compression than
>>>> snappy.
>>>> Snappy is definitely not a pareto improvement (although we tend to use
>>>>> snappy by default). Since it's not always better I think you would need
>>>>>
>>>> a
>>>> more solid argument to change the default.
>>>>> Adam
>>>>>
>>>>> On Aug 13, 2016 8:06 PM, "Josh Elser"<josh.elser@gmail.com>   wrote:
>>>>>
>>>>> Same motivation of using it as for making it the default. I am not
>>>>> aware
>>>> of any downside to it. It's become pretty standard across all
>>>>> installations
>>>>>
>>>>>> I've worked with for years.
>>>>>>
>>>>>> Asking because I am no oracle on the matter. I could just be ignorant
>>>>>>
>>>>> of
>>>> some issue, but, given my current understanding, there is no downside
>>>>> for
>>>> the average case.
>>>>>> Christopher wrote:
>>>>>>
>>>>>> Sorry. I wasn't clear. I understand the motivation for using it...
>>>>>> I'm
>>>> asking about the motivation for making it the default.
>>>>>>> Since both are available, I'm not sure the default matters *that*
>>>>>>>
>>>>>> much,
>>>> but
>>>>>>> it could be an unexpected change for those preferring GZ.
>>>>>>>
>>>>>>> Also, are there any risks regarding library availability of snappy?
>>>>>>>
>>>>>> GZ
>>>> is
>>>>>> pretty ubiquitous.
>>>>>>> On Sat, Aug 13, 2016 at 10:59 PM Josh Elser<josh.elser@gmail.com>
>>>>>>>
>>>>>> wrote:
>>>>>> Uhh, besides what I already mentioned? (close in compressed size
but
>>>>>>>> "much" faster)
>>>>>>>>
>>>>>>>> Christopher wrote:
>>>>>>>>
>>>>>>>> What's the motivation for changing it?
>>>>>>>>> On Sat, Aug 13, 2016 at 10:47 PM Josh Elser<josh.elser@gmail.com>
>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>> Any reason we don't want to do this? Last rule-of-thumb I
heard was
>>>>>>>> that
>>>>>> snappy is often close enough in compression to GZ but quite a bit
>>>>>>>>>> faster
>>>>>>>>>> (I don't remember exactly how much).
>>>>>>>>>>
>>>>>>>>>> - Josh
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>

Mime
View raw message