cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mickael Delanoë <delanoe...@gmail.com>
Subject Re: Batch : Isolation and Atomicity for same partition on multiple table
Date Fri, 15 Dec 2017 08:32:12 GMT
Yes, we try to rely on conditional batches when possible but in this case
it could not be used :
We did some tests with the conditional batches and they could not be
applied when several tables are involved in the batch, even if the tables
use the same partition key : we had the following error "batch with
conditions cannot span multiple tables".
So it could not be applied in our case.
Moreover we would like "isolation" to ensure all data are available on any
table (not only part of them) when a read occurs while the batch is
applied, which is not achievable with conditional batches.

Mickaël




Le 15 déc. 2017 07:12, "Jeff Jirsa" <jjirsa@gmail.com> a écrit :

Again, a lot of potential problems can be solved with data modeling - in
particular consider things like conditional batches where the condition is
on a static cell/column and writes go to different CQL rows.

-- 
Jeff Jirsa


On Dec 14, 2017, at 9:57 PM, Mickael Delanoë <delanoemic@gmail.com> wrote:

Thanks Jeff,
I am a little disappointed when you said the guarantee are even weeker.But
I will take a look on this and try to understand what is really done.



Le 13 déc. 2017 18:18, "Jeff Jirsa" <jjirsa@gmail.com> a écrit :

Entry point is here: https://github.com/apache/cassandra/blob/trunk/src/jav
a/org/apache/cassandra/cql3/statements/BatchStatement.java#L346 , which
will call through to https://github.com/apache/c
assandra/blob/trunk/src/java/org/apache/cassandra/service/St
orageProxy.java#L938-L953

I believe the guarantees are weaker than the blog suggests, but it's
nuanced, and a lot of these types of questions come down to data model (you
can model it in a way that you can avoid problems with weaknesses in
isolation, but that requires a detailed explanation of your use case, etc).




On Wed, Dec 13, 2017 at 8:56 AM, Mickael Delanoë <delanoemic@gmail.com>
wrote:

> Hi Nicolas,
> Thanks for you answer.
> Is your assumption 100% sure ?
> Because the few test I did - using nodetools getendpoints - shown that the
> data for the two tables when I used the same partition key went to the same
> "nodes" . So I would have expected cassandra to be smart enough to apply
> them in the memtable in a single operation to achieve the isolation as the
> whole batch will be executed on a single node.
> Does anybody know where I can find, where the batch operations are
> processed in the Cassandra source code, so I could check how all this is
> processed ?
>
> Regards,
> Mickaël
>
>
>
> 2017-12-13 11:18 GMT+01:00 Nicolas Guyomar <nicolas.guyomar@gmail.com>:
>
>> Hi Mickael,
>>
>> Partition are related to the table they exist in, so in your case, you
>> are targeting 2 partitions in 2 different tables.
>> Therefore, IMHO, you will only get atomicity using your batch statement
>>
>> On 11 December 2017 at 15:59, Mickael Delanoë <delanoemic@gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> I have a question regarding batch isolation and atomicity with query
>>> using a same partition key.
>>>
>>> The Datastax documentation says about the batches :
>>> "Combines multiple DML statements to achieve atomicity and isolation
>>> when targeting a single partition or only atomicity when targeting multiple
>>> partitions. A batch applies all DMLs within a single partition before the
>>> data is available, ensuring atomicity and isolation.""
>>>
>>> But I try to find exactly what can be considered as a "single partition"
>>> and I cannot find a clear response yet. The examples and explanations
>>> always speak about partition with only one table used inside the batch. My
>>> concern is about partition when we use different table in a batch. So I
>>> would like some clarification.
>>>
>>> Here is my use case, I have 2 tables with the same partition-key which
>>> is "user_id" :
>>>
>>> CREATE TABLE tableA (
>>>    user_id text,
>>>    clustering text,
>>>    value text,
>>>    PRIMARY KEY (user_id, clustering));
>>>
>>> CREATE TABLE tableB (
>>>    user_id text,
>>>    clustering1 text,
>>>    clustering2 text,
>>>    value text,
>>>    PRIMARY KEY (user_id, clustering1, clustering2));
>>>
>>> If I do a batch query like this :
>>>
>>> BEGIN BATCH
>>> INSERT INTO tableA (user_id, clustering, value) VALUES ('1234', 'c1',
>>> 'val1');
>>> INSERT INTO tableB (user_id, clustering1, clustering1, value) VALUES
>>> ('1234', 'cl1', 'cl2', 'avalue');
>>> APPLY BATCH;
>>>
>>> the DML statements uses the same partition-key, can we say they are
>>> targetting the same partition or, as the partition key are for different
>>> table, should we consider this is different partition? And so does this
>>> batch ensure atomicity and isolation (in the sense described in Datastax
>>> doc)? Or only atomicity?
>>>
>>> Thanks for you help,
>>> Mickaël Delanoë
>>>
>>
>>
>
>
> --
> Mickaël Delanoë
>

Mime
View raw message