cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: New Chain for : Does Cassandra use vector clocks
Date Thu, 24 Feb 2011 13:41:46 GMT
On Thu, Feb 24, 2011 at 3:22 AM, Anthony John <chirayithaj@gmail.com> wrote:

> Apologies : For some reason my response on the original mail keeps bouncing
> back, thus this new one!
> > From the other hand, the same article says:
> > "For conditional writes to work, the condition must be evaluated at all
> update
> > sites before the write can be allowed to succeed."
> >
> > This means, that when doing such an update CL=ALL must be used
>
> Sorry, but I am confused by that entire thread!
>
> Questions:-
> 1. Does Cassandra implement any kind of data locking - at any granularity
> whether it be row/colF/Col ?
>

No locking, no.


> 2. If the answer to 1 above is NO! - how does CL ALL prevent conflicts.
> Concurrent updates on exactly the same piece of data on different nodes can
> still mess each other up, right ?
>

Not sure why you are taking CL.ALL specifically. But in any CL, updating the
same piece of data means the same column value. In that case, the resolution
rules are the following:
  - If the updates have a different timestamp, keep the one with the higher
timestamp. That is, the more recent of two updates win.
  - It the timestamps are the same, then it compares the values (byte
comparison) and keep the highest value. This is just to break ties in a
consistent manner.

So if you do two truly concurrent updates (that is from two place at the
same instant), then you'll end with one of the update. This is the column
level.

However, if that simple conflict detection/resolution mechanism is not good
enough for some of your use case and you need to keep two concurrent
updates, it is easy enough. Just make sure that the update don't end up in
the same column. This is easily achieved by appending some unique identifier
to the column name for instance. And when reading, do a slice and reconcile
whatever you get back with whatever logic make sense. If you do that,
congrats, you've roughly emulated what vector clocks would do. Btw, no
locking or anything needed.

In my experience, for most things the timestamp resolution is enough. If the
same user update twice it's profile picture on you web site at the same
microsecond, it's usually fine to end up with one of the two pictures. In
the rare case where you need something more specific, using the cassandra
data model usually solves the problem easily. The reason for not having
vector clocks in Cassandra is that so far, we haven't really found much
example where it is no the case.

--
Sylvain

Mime
View raw message