cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From A J <>
Subject Re: Does the 'batch' order matter ?
Date Thu, 15 Mar 2012 15:23:12 GMT
ok ..... disappointing. You could have got atomicity like behavior
most of the time, if it was otherwise.

How does one execute a logical write that is spread in several CFs
(say in User CF, you have 'state' as a column and userid as rowkey.
But in State CF, you have state as rowkey and userid as a column)
Given atomicity is not possible, it is ok for a brief period of
inconsistency but I cannot afford permanent inconsistency for even a
single successful or timed-out write.
I cannot ever have a userid in the UserCF that is not in the state CF
or vice-versa except for a very small fraction of writes and that too
for only a few minutes at max. Writing to the state CF has to be
almost always synchronous with write to User CF.

I would guess this is general enough use case. How is this accomplished ?
Do I write to a third CF, say the 'LOG CF' with PREPARING status as
first batch. Then the second batch, which is conditional on 1st batch
being successful writes to the main User and State CFs. Then the third
batch, which is conditional on 2nd batch being successful updates the
PREPARING flag to COMPLETED flag in the LOG CF ?
I also run a standalone job every few minutes that takes PREPARING
records from the LOG CF older than some interval and apply them to the
main CFs and change its status.

This approach may not be performant but could not think of anything
else. Appreciate any ideas.


On Thu, Mar 15, 2012 at 5:22 AM, aaron morton <> wrote:
> The simple thing to say is: If you send a batch_mutate the order which the
> rows are written is undefined. So you should not make any assumptions such
> as if rows C is stored, rows A and B also have.
> They may do but AFAIK it is not part of the API contract.
> For the thrift API batch_mutate takes a Map of mutations keyed on the row
> key. CQL builds a list of row mutations in the same order as the statement.
> Even if they are in a list there is no guarantee they will be processed in
> that order.
> If you get a timed out error all you know is the mutation, as a whole, was
> applied of < CL nodes.
> Cheers
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> On 15/03/2012, at 1:22 PM, Tyler Hobbs wrote:
> Ah, my mistake, you are correct. Not sure why I had forgotten that.
> The pycassa docs are slightly wrong there, though.  It's technically atomic
> for the same key across multiple column families.  I'll get that fixed.
> On Wed, Mar 14, 2012 at 5:22 PM, A J <> wrote:
>> > No, batch_mutate() is an atomic operation.  When a node locally applies
>> > a batch mutation, either all of the changes are applied or none of them
>> > are.<
>> The steps in my batch are not confined to a single CF, nor to a single
>> key.
>> The documentation says:
>> datastax:
>> Column updates are only considered atomic within a given record (row).
>> Pycassa.batch:
>> This interface does not implement atomic operations across column
>> families. All the limitations of the batch_mutate Thrift API call
>> applies. Remember, a mutation in Cassandra is always atomic per key
>> per column family only.
>> On Wed, Mar 14, 2012 at 4:15 PM, Tyler Hobbs <> wrote:
>> > On Wed, Mar 14, 2012 at 11:50 AM, A J <> wrote:
>> >>
>> >>
>> >> Are you saying the way 'batch mutate' is coded, the order of writes in
>> >> the batch does not mean anything ? You can ask the batch to do A,B,C
>> >> and then D in sequence; but sometimes Cassandra can end up applying
>> >> just C and A,B (and D) may still not be applied ?
>> >
>> >
>> > No, batch_mutate() is an atomic operation.  When a node locally applies
>> > a
>> > batch mutation, either all of the changes are applied or none of them
>> > are.
>> >
>> > Aaron was referring to the possibility that one of the replicas received
>> > the
>> > batch_mutate, but the other replicas did not.
>> >
>> > --
>> > Tyler Hobbs
>> > DataStax
>> >
> --
> Tyler Hobbs
> DataStax

View raw message