cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dennis Roppelt (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-10085) Allow BATCH with conditions to span multiple tables with same partition key
Date Mon, 07 Sep 2015 07:07:45 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733311#comment-14733311
] 

Dennis Roppelt edited comment on CASSANDRA-10085 at 9/7/15 7:06 AM:
--------------------------------------------------------------------

Hello Sylvain,
Thank you for the quick and informative reply.
I already anticipated that this would not be trivial to implement, and might go against some
design principles created in C*.
I also can see why this won't be implemented soon or at all. I will look into a different
approach of modelling my data or see if another database is a better fit (obviously standard
RDBMS will work for the model).

It is a bit disappointing though, that this limitation was not mentioned in the documentation.
Compared to other NoSQL-DBs, Cassandra usually has an outstandingly good documentation, with
lots of examples and details.
How would I suggest adding a hint about the limitiation with batch + conditional statements
across tables? Creating a new issue and link to this one?


was (Author: roppeldidu):
Hello Sylvain,
Thank you for the quick and informative reply.
I already anticipated that this would not be trivial to implement, and might go against some
design principles created in C*.
I also can see why this won't be implemented soon or at all. I will look into a different
approach of modelling my data or see if another database is a better fit (obviously standard
RDBMS will work for the model).

It is a bit disappointing though, that this limitation was not mentioned in the documentation.
Compared to other NoSQL-DBs, Cassandra has an outstandingly good documentation.
How would I suggest adding a hint about the limitiation with batch + conditional statements
across tables? Creating a new issue and link to this one?

> Allow BATCH with conditions to span multiple tables with same partition key
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10085
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10085
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Dennis Roppelt
>              Labels: batch, conditional-statement, partitioning
>
> h4. Use case:
> I want to use batch to keep data between tables synchronized with help of conditionals.
In particular, I want to have a ONE-to-ONE relationship in my data model. I dont want to have
an unintended upsert while trying to maintain a consistent one-to-one relationship (write
requests from other clients that try to create a different relationship with one of the keys
I inteded to use), which is why I would like the IF NOT EXISTS when inserting data into multiple
tables in a batch.
> But when trying to insert data in a batch-block with an IF NOT EXISTS, I get the following
response (reproduceable example further below):
> {code}
> cqlsh:testkeyspace> 
> BEGIN BATCH 
> INSERT INTO key1_to_key2 (partKey, key1, key2) values (1,2,3) IF NOT EXISTS; 
> INSERT INTO key2_to_key1 (partKey, key1, key2) values (1,2,3) IF NOT EXISTS; 
> APPLY BATCH;
> Bad Request: Batch with conditions cannot span multiple tables{code}
> The tables have a primary key on (partKey, key1) and (partKey, key2). Which means that
partKey in both cases is their partition key. Which in the example also is the same value.
> h4. Why I want to use a BATCH-statement that way:
> In traditional databases, to design a one-to-one relationship, I would create one table
with two keys, both with a unique constraint:
> {code:sql}
> CREATE TABLE  `myOneToOneTable` (
>   `ID` int NOT NULL,
>   `KEY_ONE` int NOT NULL,
>   `KEY_TWO` int NOT NULL,
> // some other fields
>   PRIMARY KEY (`ID`),
>   UNIQUE KEY `KEY_ONE`,
>   UNIQUE KEY `KEY_TWO`
> );
> {code}
> (simplified example taken from an [one-to-one example for mysql|http://www.mkyong.com/mysql/how-to-define-one-to-one-relationship-in-mysql/])
> As far as I know, unique constraints are not supported in Cassandra, so another way around
it would be to maintain the relationshipt between two tables, each sharing the same partition
key. For which I would like Cassandra to make sure that when I insert data, either both are
inserted at the same time or none (to prevent other writing client to insert a different one-to-one
relationship than me). 
> [According to the documentation, this is what BATCH-Statements are used for|http://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html]
> {quote}
> Instead, using batches to synchronize data to tables is a legitimate operation. 
> {quote}
> I expected it to be able to do it, but as seen above, I cannot enforce INSERT IF NOT
EXISTS over multiple tables.
> I do understand the background as to *why* there is a limitation coded that way, so that
not too many nodes have to take part in processing of the batch. However, if the operations
use
> # primary keys in their statements for each table
> # all primary keys contain the same value, thus resulting in the same hash for partioning
> # tables belong to the same keyspace
> the batch statement should be able to be limited to the same amount of nodes as if the
batch contained INSERT IF NOT EXISTs for only one table. 
> h4. Steps to reproduce desired behaviour:
> I used the [cassandra ova on virtual box|http://www.planetcassandra.org/install-cassandra-ova-on-virtualbox/],
but also a Windows Setup with 2.2.0
> {code}
> Connected to Cluster on a Stick at localhost:9160.                              
> [cqlsh 4.1.1 | Cassandra 2.0.7 | CQL spec 3.1.1 | Thrift protocol 19.39.0]      
> Use HELP for help.                                                              
> cqlsh> CREATE KEYSPACE conditionalBatch                                         
> WITH REPLICATION = {'class':'SimpleStrategy','replication_factor':1};    
> cqlsh> use conditionalbatch ;                                                   
> cqlsh:conditionalbatch> CREATE TABLE key1_to_key2 (                             
> partKey int,                                            
> key1 int,                                               
> key2 int,                                               
> PRIMARY KEY (partKey, key1)                             
> );                                                      
> cqlsh:conditionalbatch> CREATE TABLE key2_to_key1 ( partKey int, key2 int, key1 
> int, PRIMARY KEY (partKey, key2) 
> );                                             
> cqlsh:conditionalbatch> BEGIN BATCH                                             
> INSERT INTO key1_to_key2 (partKey, key1, key2) VALUES (1,2,3) IF NOT EXISTS;        
                                                   
> INSERT INTO key2_to_key1 (partKey, Key1, key2) VALUES (1,2,3) IF NOT EXISTS;        
                                                   
> APPLY BATCH;                                            
> {code}
> h4. What should happen:
> Either succeding, or:
> {code}
>  applied | partkey | key1 | key2|                                              
> -----------+---------+------+------                                             
>      false |       1 |    2 |    3                                              
>      false |       1 |    2 |    3   
> {code}
> h4. What happens instead:
> {code}
> Bad Request: Batch with conditions cannot span multiple tables 
> {code}
> In my opinion, this is a very crucial feature wich enables the users to maintain unique
constraints with a relatively small amount of work. There certainly are other ways to maintain
unique constrains, but I have not found a trivial way within cassandra that lets me that easily.
But maybe I have overlooked something though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message