cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Les Hazlewood (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5959) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)
Date Tue, 03 Sep 2013 18:43:53 GMT


Les Hazlewood commented on CASSANDRA-5959:

[~slebresne] Thanks for the added comments!  We're fine upgrading to 2.0 now that it has been
made final and released, so we will be able to benefit from the CASSANDRA-4693 fix.

But as you suggested, I do like the idea of adding in the proposed syntax as a convenience
(understanding that it wouldn't be a performance improvement).  But since this issue originally
reflected our performance needs within our software (and not a person using cqlsh), our particular
concern has been satisfied with the release of C* 2.0.  

I'll let someone else resurrect this issue if they feel it is desirable enough to consume
C* software development time/resources. :)
> CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)
> ----------------------------------------------------------------------------------------
>                 Key: CASSANDRA-5959
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Drivers
>            Reporter: Les Hazlewood
>              Labels: CQL
> h3. Impetus for this Request
> (from the original [question on StackOverflow|]):
> I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before inserting,
I have all the data for the entire row ready to go (in memory):
> {code}
> +---------+------+------+------+------+-------+
> |         | 0    | 1    | 2    | ...  | 49999 |
> | row_id  +------+------+------+------+-------+
> |         | text | text | text | ...  | text  |
> +---------+------+------+------|------+-------+
> {code}
> The column names are integers, allowing slicing for pagination. The column values are
a value at that particular index.
> CQL3 table definition:
> {code}
> create table results (
>     row_id text,
>     index int,
>     value text,
>     primary key (row_id, index)
> ) 
> with compact storage;
> {code}
> As I already have the row_id and all 50,000 name/value pairs in memory, I just want to
insert a single row into Cassandra in a single request/operation so it is as fast as possible.
> The only thing I can seem to find is to do execute the following 50,000 times:
> {code}
> INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
> {code}
> where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is the text
value to store at location {{i}}.
> With the Datastax Java Driver client and C* server on the same development machine, this
took a full minute to execute.
> Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver Batch|]
on the same machine took 7.5 minutes.  I thought batches were supposed to be _faster_ than
individual inserts?
> We tried instead with a Thrift client (Astyanax) and the same insert via a [MutationBatch|].
 This took _235 milliseconds_.
> h3. Feature Request
> As a result of this performance testing, this issue is to request that CQL3 support batch
mutation operations as a single operation (statement) to ensure the same speed/performance
benefits as existing Thrift clients.
> Example suggested syntax (based on the above example table/column family):
> {code}
> insert into results (row_id, (index,value)) values 
>     ((0,text0), (1,text1), (2,text2), ..., (N,textN));
> {code}
> Each value in the {{values}} clause is a tuple.  The first tuple element is the column
name, the second tuple element is the column value.  This seems to be the most simple/accurate
representation of what happens during a batch insert/mutate.
> Not having this CQL feature forced us to remove the Datastax Java Driver (which we liked)
in favor of Astyanax because Astyanax supports this behavior.  We desire feature/performance
parity between Thrift and CQL3/Datastax Java Driver, so we hope this request improves both
CQL3 and the Driver.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message