cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergio Bossa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5062) Support CAS
Date Wed, 27 Feb 2013 13:13:15 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588303#comment-13588303
] 

Sergio Bossa commented on CASSANDRA-5062:
-----------------------------------------

[~jbellis]

{quote}This is not correct for Paxos. (Not sufficiently familiar with ZAB to comment there){quote}

Right, I was talking about Zab, which does that exactly for improving liveness and performance.

{quote}What does this 2PC-that-avoids-lost-acks look like?{quote}

Well, given my lack of familiarity with Cassandra internals, I may be missing something here,
so let's be clear about the lost-ack problem: my understanding of lost-ack is about what happens
when the coordinator node sends a QUORUM request and fails before getting the ack back, causing
uncertainty about the request status. So please correct me if I'm wrong here.
But stated this way, this problem can be overcame with Zab-like 2PC: once the coordinator
gets the acks from the prepare phase, it can commit without having to wait for all acks, because
only committed values with the highest "commit id" will be (QUORUM) read. Then:
1) If the coordinator fails during the prepare phase (lost ack), nothing will be committed,
hence the previous committed value will be read, and if it will be hinted/repaired, it will
just be a tentative value.
2) If the coordinator fails after sending commits, the coordinator with the highest commit
id will take over and "realign" followers.
3) If a partition happens, the coordinator with the minority of followers will refuse to operate
CAS (Paxos would behave exactly the same here).

Does it make sense to you?

Obviously I may be missing some corner case, and above all, I'm not sure about how comfortably
this could be implemented in Cassandra (lack of knowledge again), so take my comments just
as food for thoughts.
                
> Support CAS
> -----------
>
>                 Key: CASSANDRA-5062
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5062
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>             Fix For: 2.0
>
>
> "Strong" consistency is not enough to prevent race conditions.  The classic example is
user account creation: we want to ensure usernames are unique, so we only want to signal account
creation success if nobody else has created the account yet.  But naive read-then-write allows
clients to race and both think they have a green light to create.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message