cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cristian Opris (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5062) Support CAS
Date Mon, 25 Feb 2013 21:14:14 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586290#comment-13586290
] 

Cristian Opris commented on CASSANDRA-5062:
-------------------------------------------

This shouldn't be too complicated with Paxos leader election very similar to Spinnaker

I don't think it requires changing the read/write paths at the lower level, at least not significantly.

Assume for the sake of simplicity that we use a column prefix to encode the version

The leader elected should always be the one that has the latest version.

This allows the leader to perform read-modify-write (conditional update) locally and do a
simple quorum write to propagate that if successful.

The leader can also increment the version sequentially.

Conflicting writes from other replicas cannot succeed because any node that wants to write
needs to get itself elected reader first.

Since we do quorum writes not all replicas will have the full sequence of versions but regular
anti-entropy (read-repair) on quorum reads should take care of that.
  
If the leader fails the newly elected leader necessarily will be the one that has the latest
write so it can continue to do cas locally.

Anti-entropy should also take care of recovery and catch-up of a replica just like now.

I believe this can all be done on top of existing functionality without major changes to read/write
paths

You could also reuse the Zab algorithm from ZK for expediency without using having to use
the entire 
ZK codebase.



                
> Support CAS
> -----------
>
>                 Key: CASSANDRA-5062
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5062
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>             Fix For: 2.0
>
>
> "Strong" consistency is not enough to prevent race conditions.  The classic example is
user account creation: we want to ensure usernames are unique, so we only want to signal account
creation success if nobody else has created the account yet.  But naive read-then-write allows
clients to race and both think they have a green light to create.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message