cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Stupp (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5959) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)
Date Fri, 01 Aug 2014 14:50:39 GMT


Robert Stupp commented on CASSANDRA-5959:

Interesting. Thx :)

> CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)
> ----------------------------------------------------------------------------------------
>                 Key: CASSANDRA-5959
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Drivers (now out of tree)
>            Reporter: Les Hazlewood
>              Labels: CQL
> h3. Impetus for this Request
> (from the original [question on StackOverflow|]):
> I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before inserting,
I have all the data for the entire row ready to go (in memory):
> {code}
> +---------+------+------+------+------+-------+
> |         | 0    | 1    | 2    | ...  | 49999 |
> | row_id  +------+------+------+------+-------+
> |         | text | text | text | ...  | text  |
> +---------+------+------+------|------+-------+
> {code}
> The column names are integers, allowing slicing for pagination. The column values are
a value at that particular index.
> CQL3 table definition:
> {code}
> create table results (
>     row_id text,
>     index int,
>     value text,
>     primary key (row_id, index)
> ) 
> with compact storage;
> {code}
> As I already have the row_id and all 50,000 name/value pairs in memory, I just want to
insert a single row into Cassandra in a single request/operation so it is as fast as possible.
> The only thing I can seem to find is to do execute the following 50,000 times:
> {code}
> INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
> {code}
> where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is the text
value to store at location {{i}}.
> With the Datastax Java Driver client and C* server on the same development machine, this
took a full minute to execute.
> Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver Batch|]
on the same machine took 7.5 minutes.  I thought batches were supposed to be _faster_ than
individual inserts?
> We tried instead with a Thrift client (Astyanax) and the same insert via a [MutationBatch|].
 This took _235 milliseconds_.
> h3. Feature Request
> As a result of this performance testing, this issue is to request that CQL3 support batch
mutation operations as a single operation (statement) to ensure the same speed/performance
benefits as existing Thrift clients.
> Example suggested syntax (based on the above example table/column family):
> {code}
> insert into results (row_id, (index,value)) values 
>     ((0,text0), (1,text1), (2,text2), ..., (N,textN));
> {code}
> Each value in the {{values}} clause is a tuple.  The first tuple element is the column
name, the second tuple element is the column value.  This seems to be the most simple/accurate
representation of what happens during a batch insert/mutate.
> Not having this CQL feature forced us to remove the Datastax Java Driver (which we liked)
in favor of Astyanax because Astyanax supports this behavior.  We desire feature/performance
parity between Thrift and CQL3/Datastax Java Driver, so we hope this request improves both
CQL3 and the Driver.

This message was sent by Atlassian JIRA

View raw message