cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yang Yang (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-2774) one way to make counter delete work better
Date Wed, 15 Jun 2011 16:53:48 GMT


Yang Yang commented on CASSANDRA-2774:

you are right "it just cannot return 1 (at) *that time* ", 0 or 2 is the value not stable
that the system had from some past
snapshot in time.

but it will eventually come to answer 1:

since our edge case above assumes that B has not got the deletion yet, the leader in the second
increment can not be A, cuz otherwise B must have got the deletion from A, since on A the
increment comes later. so B was the leader in the second increment.

for C, it now has new epoch,  let's say A's second increment reaches C (through repair, since
A is not the leader in second increment), this increment has new epoch, so it will be accepted
by C; if B's second increment reaches C, it belongs to the old epoch, it will be rejected.

for B, it is still on the old epoch,  after the second increment, B's count is 2 of the old
epoch. but when A's increment goes to B through repair, or is reconciled in read with B, the
result is going to be 1. if C's deletion goes to B, B is going to be brought more up to date
to a value of 0 of new epoch. 

the above analysis does not go through all possible scenarios, but to give a definitive proof
of the conjecture that "all nodes return *the* ordering given by client , in case of quorum
read/write", I need to think more. 

but as I stated in my last comment, at least we can be sure that the new approach guarantees
*some* common agreement eventually. it would be nice if we achieve *the* agreement in case
of quorum, but that's not my  main argument

> one way to make counter delete work better
> ------------------------------------------
>                 Key: CASSANDRA-2774
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 0.8.0
>            Reporter: Yang Yang
>         Attachments: counter_delete.diff
> current Counter does not work with delete, because different merging order of sstables
would produces different result, for example:
> add 1
> delete 
> add 2
> if the merging happens by 1-2, (1,2)--3  order, the result we see will be 2
> if merging is: 1--3, (1,3)--2, the result will be 3.
> the issue is that delete now can not separate out previous adds and adds later than the
delete. supposedly a delete is to create a completely new incarnation of the counter, or a
new "lifetime", or "epoch". the new approach utilizes the concept of "epoch number", so that
each delete bumps up the epoch number. since each write is replicated (replicate on write
is almost always enabled in practice, if this is a concern, we could further force ROW in
case of delete ), so the epoch number is global to a replica set
> changes are attached, existing tests pass fine, some tests are modified since the semantic
is changed a bit. some cql tests do not pass in the original 0.8.0 source, that's not the
fault of this change.
> see details at
> the goal of this is to make delete work ( at least with consistent behavior, yes in case
of long network partition, the behavior is not ideal, but it's consistent with the definition
of logical clock), so that we could have expiring Counters

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message