cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Jirsa <jji...@gmail.com>
Subject Re: Consistent vs inconsistent range movements
Date Fri, 03 Mar 2017 23:44:34 GMT
Imagine you have a cluster with RF=3, and you write a key with CL:QUORUM,
it goes to nodes 1 and 3, but node 2 is offline.

Some time later, node 2 comes online.

Then you want to add node 2.5 in between nodes 1 and 3.

If you stream data from node 2, you violate consistency guarantees (quorum)
- for data on node 2.5, you'll have data equal to node 2, which means a
query with CL:QUORUM may query nodes 2 and 2.5 and return "no data", even
though it's on node 1 (and was on node 3, before 3 stopped being a replica).

To solve this problem, https://issues.apache.org/jira/browse/CASSANDRA-2434
made it so we always try to do "consistent range movements" - that is, when
we add node 2.5, we stream from the replica it's replacing (3) - now when
you query with quorum, you'll be guaranteed that 2 of the 3 remaining
replicas with the data (if you wrote with quorum), and everything works as
expected.

This guarantee isn't free - strict range movements can limit your ability
to move (or join, or remove) multiple nodes at the same time (especially in
VNode clusters), so there's a system property to turn it off if you really
really really really really know what you're doing (which I imagine you see
in the code, which is why you're asking).

Definitely read CASSANDRA-2434 . That's probably the best documentation of
this feature.




On Fri, Mar 3, 2017 at 2:04 PM, benjamin roth <brstgt@gmail.com> wrote:

> Hi,
>
> Can anyone tell the difference between consistent + inconsistent range
> movements?
> What exactly makes them consistent or inconsistent?
> In what situations can both of them occur?
>
> It would be great to get a correct and deep understanding of that for
> further MV improvments. My intuition tells me that a rebuild / removenode
> can break MV consistency, but to prove it I need more information.
> I am also happy about code references - it's just very tedious to read all
> through the code to get an overview of all that without some prose
> information.
>
> Thanks in advance
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message