incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Mosgalin <vmosga...@aintsys.com>
Subject Newbie question about writer/reader consistency
Date Mon, 26 Dec 2011 18:38:46 GMT
Hello everybody.

I am developer of financial-related application, and I'm currently evaluating
various nosql databases for our current goal: storing various views which show
state of the system in different aspects after each transaction.

The write load seems to be bigger than typical SQL database would handle
without problems - under test load of tens of transactions per second, each
transaction generates changes in dozen of views, which generates hundreds
messages per second total. Each message ("change") for each view must be
stored, as well as resulting view (generated as kind-of update of old view); it
means multiple inserts & updates per message which go as single transaction. I
started to look into nosql databases. I'm a bit puzzled by guarantees of
atomicity and isolation that Cassandra provides, so my question will be about
how to (if possible at all) attain required level of consistency in Cassandra.
I've read various documents and introductions into Cassandra's data model but
still can't understands basics about data consistency.  This discussion
http://stackoverflow.com/questions/6033888/cassandra-atomicity-isolation-of-column-updates-on-a-single-row-on-on-single-n
makes me feel disappointed about consistency in Cassandra, but I wonder is
there is a way to work around it.

The requirements are like this. There is one writer, which modifies two
"tables" (I'm sorry for using "SQL" terms, I just don't want to create
more confusion for mapping them into Cassandra terms at this stage). For
the first table, it's a simple insert; index is unique SCN which is
guaranteed to be larger than previous one.

Let's say it inserts
SCN DATA
1   AAA
2   BBB
3   CCC

The goal for the client (reader) is to get all the data from scn N to scn M
without gaps. It is fine if it can't see the very latest SCN yet, that is, gets
"1:AAA" and "2:BBB" on request "SCN: 1..END"; what is NOT fine is to get
something "1:AAA" and "3:CCC". In other words, does Cassandra provide
consistency between writer and reader regarding the order of changes? Or under
some conditions (say, very fast writes - but always from single writer - and
many concurrent reads or something) it might be possible to get that kind of gap?

The second question is similar, but on bigger scale. The second table must be
modified in more complicated way; both insert and update of old data are
required. Sometimes it's few insert and few updates, which must be done
atomically - under no conditions reader should be able to see the mid-state of
these inserts/updates. Fortunately, all these new changes will have a new key
(new SCNs), so if it would be just possible to use a column in separate table
which stores "last safe SCN" it would work - but I have no faith that Cassandra
offers such level of consistency. In example, let's say it works like this

current last safe SCN: 1000

update (must be viewed as an atomic "transaction"):
SCN   DATA
1001  AAA
1002  BBB
800   1001
1003  DDD

new last safe SCN: 1003

Here, readers need a mean to filter out lines with SCN>1000 until the writer is
done writing "1003:DDD" line. They also need to filter out "800:1001" line
because it references SCN which is after current "last safe" one.

"last safe SCN" is stored somewhere, and for this pattern to work I once again
need "execution order" consistency - no reader should ever see "last safe:
1003" line before all the previous lines were commited; and any reader who saw
"last safe: 1003" line must be able to see all the lines from that update just
like they are right now.

Is this possible to do in Cassandra?


Mime
View raw message