cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-5062) Support CAS
Date Sun, 03 Mar 2013 06:45:13 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-5062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13591659#comment-13591659
] 

Jonathan Ellis commented on CASSANDRA-5062:
-------------------------------------------

I think we can fix the "partial commit" problem in my diagram 3.  The key is forcing CAS updates
to occur with strictly increasing column (cell) timestamps.  Then, we can rely on standard
"use the newest value" read-repair.  Specifically:
# Instead of replicas checking raw timeuuid order for propose/promise, they will require that
a new ballot be *larger* in the time component than an accepted ballot or mostRecentCommitted.
 (We do still want to use a timeuuid value though instead of a raw timestamp to guarantee
uniqueness across proposals.)
# Coordinators will generate ballots from the min timestamp in the new columns being proposed.
 Thus, any committed proposal will have a higher timestamp than any previously committed one.

The good:
# No need for a hairy mess of CAS ballot order trumping timestamp order during HH/AES/RR.
 Much easier if ballot/timestamp order are the same.
# Non-CAS ops can be mixed in with CAS ones with sane results (as long as potentially concurrent
ones are all CAS, of course).

The bad:
# Just the obvious (big) one: we're rate-limited by both clock resolution and clock skew.
 But, this is reasonable for our goals for 2.0.  And I'm not actually sure it's even possible
to avoid, if we want to allow mixing CAS and non-CAS ops in the same CF (see "hairy mess"
above).

Notes:
# Rejecting "newer" proposals w/ equal time components slows us down (we return false and
client has to try again w/ a newer ballot) but does not compromise correctness.
# Different CAS ops may operate on different sets of columns, so because we RR for one CAS
op does not mean that we've caught up the affected replicas entirely, but it does mean we've
caught them up for the columns being checked on this time, which is what we care about.

                
> Support CAS
> -----------
>
>                 Key: CASSANDRA-5062
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5062
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>             Fix For: 2.0
>
>         Attachments: half-baked commit 1.jpg, half-baked commit 2.jpg, half-baked commit
3.jpg
>
>
> "Strong" consistency is not enough to prevent race conditions.  The classic example is
user account creation: we want to ensure usernames are unique, so we only want to signal account
creation success if nobody else has created the account yet.  But naive read-then-write allows
clients to race and both think they have a green light to create.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message