cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-7592) Ownership changes can violate consistency
Date Tue, 22 Jul 2014 23:10:39 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-7592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071095#comment-14071095
] 

Brandon Williams commented on CASSANDRA-7592:
---------------------------------------------

bq. This could be solved by continuing writes to the old replica for some time (maybe ring
delay) after the ownership changes.

That seems reasonable.  We could also make the 'joined' announcement both active (rpc) and
passive (gossip) like schema to narrow the window quite a bit.

> Ownership changes can violate consistency
> -----------------------------------------
>
>                 Key: CASSANDRA-7592
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7592
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Richard Low
>
> CASSANDRA-2434 goes a long way to avoiding consistency violations when growing a cluster.
However, there is still a window when consistency can be violated when switching ownership
of a range.
> Suppose you have replication factor 3 and all reads and writes at quorum. The first part
of the ring looks like this:
> Z: 0
> A: 100
> B: 200
> C: 300
> Choose two random coordinators, C1 and C2. Then you bootstrap node X at token 50.
> Consider the token range 0-50. Before bootstrap, this is stored on A, B, C. During bootstrap,
writes go to X, A, B, C (and must succeed on 3) and reads choose two from A, B, C. After bootstrap,
the range is on X, A, B.
> When the bootstrap completes, suppose C1 processes the ownership change at t1 and C2
at t4. Then the following can give an inconsistency:
> t1: C1 switches ownership.
> t2: C1 performs write, so sends write to X, A, B. A is busy and drops the write, but
it succeeds because X and B return.
> t3: C2 performs a read. It hasn’t done the switch and chooses A and C. Neither got
the write at t2 so null is returned.
> t4: C2 switches ownership.
> This could be solved by continuing writes to the old replica for some time (maybe ring
delay) after the ownership changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message