cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10085) Allow BATCH with conditions to span multiple tables with same partition key
Date Sun, 16 Aug 2015 10:13:45 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14698615#comment-14698615
] 

Sylvain Lebresne commented on CASSANDRA-10085:
----------------------------------------------

I will simply start by noting that doing this is not trivial considering the current implementation
since everything Paxos related is currently grouped by cfId. And while changing that is not
too complex in itself, dealing with backward compatibility is (as always) more painful. And
of course, doing so would also potentially create contention between tables, which would be
desired in your case but might be detrimental in other cases, so we'd have to debate whether
that's acceptable (of if we make it an option, which adds complexity to the system).

I'll also note that:
# conditionals are currently pretty costly in C* and should be used sparingly.
# your justification for this sounds like you're trying to translate as directly as possible
what do in SQL DBs into C*. That's rarely the right approach.
That said, JIRA is not the right place to discuss modeling options so I could maybe encourage
you to start by sending a mail on the mailing list asking others how they usually deal with
the kind of issue you're trying to deal with. You might get satisfying answers.

But to manage expectation, given the technical difficulties above and the fact that we're
not as convinced as you are that this is crucial (if we were, we would have designed it to
allow it in the first place), I would expect this to be dealt with in the short term.

Which is not to say that this wouldn't be a nice to have all other things considered.

> Allow BATCH with conditions to span multiple tables with same partition key
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10085
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10085
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Dennis Roppelt
>              Labels: batch, conditional-statement, partitioning
>
> h4. Use case:
> I want to use batch to keep data between tables synchronized with help of conditionals.
In particular, I want to have a ONE-to-ONE relationship in my data model. I dont want to have
an unintended upsert while trying to maintain a consistent one-to-one relationship (write
requests from other clients that try to create a different relationship with one of the keys
I inteded to use), which is why I would like the IF NOT EXISTS when inserting data into multiple
tables in a batch.
> But when trying to insert data in a batch-block with an IF NOT EXISTS, I get the following
response (reproduceable example further below):
> {code}
> cqlsh:testkeyspace> 
> BEGIN BATCH 
> INSERT INTO key1_to_key2 (partKey, key1, key2) values (1,2,3) IF NOT EXISTS; 
> INSERT INTO key2_to_key1 (partKey, key1, key2) values (1,2,3) IF NOT EXISTS; 
> APPLY BATCH;
> Bad Request: Batch with conditions cannot span multiple tables{code}
> The tables have a primary key on (partKey, key1) and (partKey, key2). Which means that
partKey in both cases is their partition key. Which in the example also is the same value.
> h4. Why I want to use a BATCH-statement that way:
> In traditional databases, to design a one-to-one relationship, I would create one table
with two keys, both with a unique constraint:
> {code:sql}
> CREATE TABLE  `myOneToOneTable` (
>   `ID` int NOT NULL,
>   `KEY_ONE` int NOT NULL,
>   `KEY_TWO` int NOT NULL,
> // some other fields
>   PRIMARY KEY (`ID`),
>   UNIQUE KEY `KEY_ONE`,
>   UNIQUE KEY `KEY_TWO`
> );
> {code}
> (simplified example taken from an [one-to-one example for mysql|http://www.mkyong.com/mysql/how-to-define-one-to-one-relationship-in-mysql/])
> As far as I know, unique constraints are not supported in Cassandra, so another way around
it would be to maintain the relationshipt between two tables, each sharing the same partition
key. For which I would like Cassandra to make sure that when I insert data, either both are
inserted at the same time or none (to prevent other writing client to insert a different one-to-one
relationship than me). 
> [According to the documentation, this is what BATCH-Statements are used for|http://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html]
> {quote}
> Instead, using batches to synchronize data to tables is a legitimate operation. 
> {quote}
> I expected it to be able to do it, but as seen above, I cannot enforce INSERT IF NOT
EXISTS over multiple tables.
> I do understand the background as to *why* there is a limitation coded that way, so that
not too many nodes have to take part in processing of the batch. However, if the operations
use
> # primary keys in their statements for each table
> # all primary keys contain the same value, thus resulting in the same hash for partioning
> # tables belong to the same keyspace
> the batch statement should be able to be limited to the same amount of nodes as if the
batch contained INSERT IF NOT EXISTs for only one table. 
> h4. Steps to reproduce desired behaviour:
> I used the [cassandra ova on virtual box|http://www.planetcassandra.org/install-cassandra-ova-on-virtualbox/],
but also a Windows Setup with 2.2.0
> {code}
> Connected to Cluster on a Stick at localhost:9160.                              
> [cqlsh 4.1.1 | Cassandra 2.0.7 | CQL spec 3.1.1 | Thrift protocol 19.39.0]      
> Use HELP for help.                                                              
> cqlsh> CREATE KEYSPACE conditionalBatch                                         
> WITH REPLICATION = {'class':'SimpleStrategy','replication_factor':1};    
> cqlsh> use conditionalbatch ;                                                   
> cqlsh:conditionalbatch> CREATE TABLE key1_to_key2 (                             
> partKey int,                                            
> key1 int,                                               
> key2 int,                                               
> PRIMARY KEY (partKey, key1)                             
> );                                                      
> cqlsh:conditionalbatch> CREATE TABLE key2_to_key1 ( partKey int, key2 int, key1 
> int, PRIMARY KEY (partKey, key2) 
> );                                             
> cqlsh:conditionalbatch> BEGIN BATCH                                             
> INSERT INTO key1_to_key2 (partKey, key1, key2) VALUES (1,2,3) IF NOT EXISTS;        
                                                   
> INSERT INTO key2_to_key1 (partKey, Key1, key2) VALUES (1,2,3) IF NOT EXISTS;        
                                                   
> APPLY BATCH;                                            
> {code}
> h4. What should happen:
> Either succeding, or:
> {code}
>  applied | partkey | key1 | key2|                                              
> -----------+---------+------+------                                             
>      false |       1 |    2 |    3                                              
>      false |       1 |    2 |    3   
> {code}
> h4. What happens instead:
> {code}
> Bad Request: Batch with conditions cannot span multiple tables 
> {code}
> In my opinion, this is a very crucial feature wich enables the users to maintain unique
constraints with a relatively small amount of work. There certainly are other ways to maintain
unique constrains, but I have not found a trivial way within cassandra that lets me that easily.
But maybe I have overlooked something though.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message