cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-13910) Consider deprecating (then removing) read_repair_chance/dclocal_read_repair_chance
Date Tue, 03 Oct 2017 08:54:02 GMT


Sylvain Lebresne commented on CASSANDRA-13910:

bq. +1 to creating a thread on @dev / @user for feed back

Not sure everybody saw it, but I did sent an email to the user list a few days ago [here|].
No feedback on that threads just yet but happy to leave it at least 1-2 more weeks before
making any move.

bq. I would believe that someone, somewhere is relying on it.

C* is successful enough that anything that has been in the product for some amount of time
is probably relied upon by someone, somewhere,, for some definition of "rely". Guaranteeing
that it's not the case as a bar for removing anything would amount to remove nothing, and
that would, imo, be dangerous for the project. But don't get me wrong, I'm totally happy to
wait more time to see if we get at least a few genuine real-life reports of cases I haven't
though about where it's clear those options provide a good trade-off. Short of that though,
I suggest we move ahead with this rather than keep something we agree is more harmful than
helpful most of the time (and we seem to more or less agree on that) on the off chance this
may piss off somebody somewhere.

> Consider deprecating (then removing) read_repair_chance/dclocal_read_repair_chance
> ----------------------------------------------------------------------------------
>                 Key: CASSANDRA-13910
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>              Labels: CommunityFeedbackRequested
> First, let me clarify so this is not misunderstood that I'm not *at all* suggesting to
remove the read-repair mechanism of detecting and repairing inconsistencies between read responses:
that mechanism is imo fine and useful.  But the {{read_repair_chance}} and {{dclocal_read_repair_chance}}
have never been about _enabling_ that mechanism, they are about querying all replicas (even
when this is not required by the consistency level) for the sole purpose of maybe read-repairing
some of the replica that wouldn't have been queried otherwise. Which btw, bring me to reason
1 for considering their removal: their naming/behavior is super confusing. Over the years,
I've seen countless users (and not only newbies) misunderstanding what those options do, and
as a consequence misunderstand when read-repair itself was happening.
> But my 2nd reason for suggesting this is that I suspect {{read_repair_chance}}/{{dclocal_read_repair_chance}}
are, especially nowadays, more harmful than anything else when enabled. When those option
kick in, what you trade-off is additional resources consumption (all nodes have to execute
the read) for a _fairly remote chance_ of having some inconsistencies repaired on _some_ replica
_a bit faster_ than they would otherwise be. To justify that last part, let's recall that:
> # most inconsistencies are actually fixed by hints in practice; and in the case where
a node stay dead for a long time so that hints ends up timing-out, you really should repair
the node when it comes back (if not simply re-bootstrapping it).  Read-repair probably don't
fix _that_ much stuff in the first place.
> # again, read-repair do happen without those options kicking in. If you do reads at {{QUORUM}},
inconsistencies will eventually get read-repaired all the same.  Just a tiny bit less quickly.
> # I suspect almost everyone use a low "chance" for those options at best (because the
extra resources consumption is real), so at the end of the day, it's up to chance how much
faster this fixes inconsistencies.
> Overall, I'm having a hard time imagining real cases where that trade-off really make
sense. Don't get me wrong, those options had their places a long time ago when hints weren't
working all that well, but I think they bring more confusion than benefits now.
> And I think it's sane to reconsider stuffs every once in a while, and to clean up anything
that may not make all that much sense anymore, which I think is the case here.
> Tl;dr, I feel the benefits brought by those options are very slim at best and well overshadowed
by the confusion they bring, and not worth maintaining the code that supports them (which,
to be fair, isn't huge, but getting rid of {{ReadCallback.AsyncRepairRunner}} wouldn't hurt
for instance).
> Lastly, if the consensus here ends up being that they can have their use in weird case
and that we fill supporting those cases is worth confusing everyone else and maintaining that
code, I would still suggest disabling them totally by default.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message