cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies
Date Wed, 19 Apr 2017 08:58:41 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15974342#comment-15974342
] 

Sylvain Lebresne commented on CASSANDRA-12126:
----------------------------------------------

bq. What is the distinction you are proposing?

Not sure, I think we don't put the same definitions on operation visibility. What I'm saying
is that "if an operation has a visible outcome, then that outcome should be visible (by serial
operations) by any subsequent operation (so as soon as the operation returns to the client
if you will)". In particular, if a serial read follows a serial write (meaning that it's started
after the write returned, even with a timeout), then if the write has any effect, the read
should see it.

Note that when you get a timeout on the initial write, you don't know if the write has been
applied or not, but the whole point of a serial read is to be able to unequivocally decide
what was that outcome. If we can't guarantee that, if there is no way to observe if a timed-out
write has been applied or not, then I'm not sure how one would use LWT in the first place.


> CAS Reads Inconsistencies 
> --------------------------
>
>                 Key: CASSANDRA-12126
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>            Reporter: sankalp kohli
>            Assignee: Stefan Podkowinski
>
> While looking at the CAS code in Cassandra, I found a potential issue with CAS Reads.
Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies true to
a propose and saves the commit in accepted filed. The other two machines B and C does not
get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted but not committed
and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the value written
in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that there is
something inflight from A and will propose and commit it with the current ballot. Now we can
read the value written in step 1 as part of this CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value written in
step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and commit a different
value than step 1. Step 1 value will never be seen again and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It talks
about this issue which is how learners can find out if majority of the acceptors have accepted
the proposal. 
> In step 3, it is correct that we propose the value again since we dont know if it was
accepted by majority of acceptors. When we ask majority of acceptors, and more than one acceptors
but not majority has something in flight, we have no way of knowing if it is accepted by majority
of acceptors. So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable with respect
to writes and other reads. In this case, we know that majority of acceptors have no inflight
commit which means we have majority that nothing was accepted by majority. I think we should
run a propose step here with empty commit and that will cause write written in step 1 to not
be visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read or will
never see it which is what we want. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message