cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Cassandra atomicity/isolation/transaction in multithread counter updates
Date Mon, 18 Jun 2012 03:57:37 GMT
> I'm in a pseudo-deadlock 
BOOM BOOM ! :)

>  (N.B. The updates requires a read of current value before the update write. Otherwise
counter column can be used, but in my opinion the problem still remain).
Writes in the cassandra server do not require a read. 

> My simple question is: what happens when two (or more) threads try to update (increment)
the same integer column value of the same row in a column family?
Multiple values for the same column are deterministically resolved. So actual order of the
interleaving on the server side does not matter. 

Either thread in your example will compare the column A it's trying to write with what is
in the memtable. The columns are then resolved as:
* deletes with a higher time stamp wins
* next the column instance with the highest timestamp wins. 
* finally the column instance with the greater byte value wins

In 1.1 the threads then try to put their shadow copy of the data that was in the memtable
back. If it's changed they get it again and try the write.

if two write threads start at the same time and try to apply their change to the memtable
at the (roughly) same time, one will win and the other will "redo" the write in memory. The
order this occurs in is irrelevant. 

Cheers
  
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 16/06/2012, at 7:37 PM, Manuel Peli wrote:

> I'm in a pseudo-deadlock about Cassandra and atomicity/isolation/transaction arguments.
My simple question is: what happens when two (or more) threads try to update (increment) the
same integer column value of the same row in a column family? I've read something about row-level
isolation, but I don't sure that is managed properly. Any suggestions? (N.B. The updates requires
a read of current value before the update write. Otherwise counter column can be used, but
in my opinion the problem still remain).
> 
> My personal idea is described next. Because it's a real time analytics application, the
counter updates are inherent only the current hour, while previous hours still remain the
same. So I think that one way to avoid the problem should be to use a RDBMS layer for current
updates (which support ACID properties) and when the hour expires consolidate data on Cassandra.
It's right?
> 
> Also in the case of RDBMS layer still remain the transaction problem: some update on
different column family are correlated and if even one fails a rollback is needed. I know
that Cassandra doesn't support transactions, but I think that, playing with replication factor
and write/read levels the problem can be mitigated, eventually implementing an application
level commit/rollback. I read something about Zookeeper, but I guess that add complexity and
latency.


Mime
View raw message