cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Patel (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8854) Support for Async Atomic Batch
Date Tue, 24 Feb 2015 05:35:13 GMT


Jay Patel commented on CASSANDRA-8854:

I agree that it feels like it’s breaking the concept of batch. 
Looking from a different angle, logged batch in C* is actually eventually atomic. In most
cases, I think, use case does not demand to read the data written by statements in logged
batch immediately since the batch may be partially executed. The only difference between successfully
executed “sync" vs “async" logged batch is that with “sync”, we'll know the failed
statements upfront which will be successful eventually anyway, whereas; with “async", we’ll
not know the failed "async" statements upfront. I think knowing which statements failed in
sync logged batch won't help much to application for dramatically changing its course of action.
So, I feel having async option in logged batch will be very useful (for almost all logged
batch use cases) from performance standpoint without loosing atomicity or any other features
of logged batches. Pls. correct me if I misunderstood anything about logged batches.

Client side batch log also sounds like a good idea. But, it may be harder for client side
"batchlog" to guarantee the same level of atomicity provided by C* logged batches (C* logged
batch is more closer to C*). Another issue is that client side "batchlog" table/file needs
to be persisted in some database/filesystem which will not be able to scale same as C* &
can become bottleneck soon.

Other than this, seems like batch statements are executed sequentially. If so, is it possible
to provide an option to execute them in parallel (the first statement as sequential and the
rest as parallel, if the first is successful)? I can tract this as an another ticket and look
into it.


> Support for Async Atomic Batch
> ------------------------------
>                 Key: CASSANDRA-8854
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jay Patel
> Use case sometimes demands atomicity (using C* logged batch) across multiple DML statements;
however, in order to minimize the end user latency, do not want to wait for all the statements
to be executed. 
> For instance, would like to have something like:
>   Sync - INSERT INTO users (userID, name, email) VALUES ('user1', ‘first user’, ’')
>   Async - INSERT INTO users_by_name (name, userID) VALUES (‘first user’, 'user1’);

>   Async -  INSERT INTO users_by_email (name, userID) VALUES (’’, 'user1’);
>   ..... more Async statements!
> Once the batch is serialized to the batchlog table and the sync statements are executed,
coordinator should return response without waiting for execution of async batch statements.
> Some of the use cases that we’re working on will get benefited significantly in terms
of latency reduction. I can take a first cut at it if we don’t see any concerns supporting
> Also, need some discussions around specifying sync/async tag for each statement in the
> Thoughts welcome. Thanks!

This message was sent by Atlassian JIRA

View raw message