cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cristian Opris (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5062) Support CAS
Date Mon, 25 Feb 2013 22:20:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586371#comment-13586371
] 

Cristian Opris commented on CASSANDRA-5062:
-------------------------------------------

Afaict from the Spinnaker paper they only require ZK for fault tolerant leader election, failure
detection and possibly cluster membership. (The right lower box in the diagram in 4.1) The
rest of it their actual data storage engine.

A few more comments:

1. Paxos can be made very efficient particularly in stable operation scenarios. I believe
Zab devolves effectively in atomic broadcast (not even 2PC) with a stable leader. So you can
normally do writes with a single roundtrip just like now. 

2. There is a difference between what I described above and what Spinnaker does. I believe
they elect a leader for the entire replica group while my description assumes 1 full paxos
instance per row write. I'm not fully clear atm how this would work but I believe even that
can be optimized to single roundtrips per write in normal operation (I believe it's in one
of Google's papers that they piggyback the commit on the next proposal for example) 

Off the top of my head: coordinator assumes one of the replicas as being most up-to-date,
attempts to use it as leader. Replica starts Paxos round attaching the write payload. If accepted
on a majority replica can send commit. Opportunistically attaches further proposals to it.
If Paxos round fails (or a number of rounds fail) it's likely the replica is behind on many
rows so coordinator switches to another replica.

Now this is all preliminary as I haven't fully thought this through but I think it's definitely
worth investigating. While it may be a complicated protocol it has significan performance
advantages over locks. Just count how many roundtrips you'd need in the "wait chain" algorithm.
Not to mentioned handling expired/orphan locks


                
> Support CAS
> -----------
>
>                 Key: CASSANDRA-5062
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5062
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>             Fix For: 2.0
>
>
> "Strong" consistency is not enough to prevent race conditions.  The classic example is
user account creation: we want to ensure usernames are unique, so we only want to signal account
creation success if nobody else has created the account yet.  But naive read-then-write allows
clients to race and both think they have a green light to create.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message