cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Les Hazlewood (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-5959) CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)
Date Fri, 30 Aug 2013 21:40:52 GMT


Les Hazlewood commented on CASSANDRA-5959:

[] While the suggestion to support slice queries on a Cassandra collection
would work for my particular example use case, I don't think it would be the ideal solution
for C* in general: the suggested solution would not work for any collection larger than 65,535
elements since that is the C* max collection size.  If I choose to use a wide row for more
columns, I'd expect the query to work on that as well.

Thanks for the idea!
> CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)
> ----------------------------------------------------------------------------------------
>                 Key: CASSANDRA-5959
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core, Drivers
>            Reporter: Les Hazlewood
>              Labels: CQL
> h3. Impetus for this Request
> (from the original [question on StackOverflow|]):
> I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before inserting,
I have all the data for the entire row ready to go (in memory):
> {code}
> +---------+------+------+------+------+-------+
> |         | 0    | 1    | 2    | ...  | 49999 |
> | row_id  +------+------+------+------+-------+
> |         | text | text | text | ...  | text  |
> +---------+------+------+------|------+-------+
> {code}
> The column names are integers, allowing slicing for pagination. The column values are
a value at that particular index.
> CQL3 table definition:
> {code}
> create table results (
>     row_id text,
>     index int,
>     value text,
>     primary key (row_id, index)
> ) 
> with compact storage;
> {code}
> As I already have the row_id and all 50,000 name/value pairs in memory, I just want to
insert a single row into Cassandra in a single request/operation so it is as fast as possible.
> The only thing I can seem to find is to do execute the following 50,000 times:
> {code}
> INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
> {code}
> where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is the text
value to store at location {{i}}.
> With the Datastax Java Driver client and C* server on the same development machine, this
took a full minute to execute.
> Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver Batch|]
on the same machine took 7.5 minutes.  I thought batches were supposed to be _faster_ than
individual inserts?
> We tried instead with a Thrift client (Astyanax) and the same insert via a [MutationBatch|].
 This took _235 milliseconds_.
> h3. Feature Request
> As a result of this performance testing, this issue is to request that CQL3 support batch
mutation operations as a single operation (statement) to ensure the same speed/performance
benefits as existing Thrift clients.
> Example suggested syntax (based on the above example table/column family):
> {code}
> insert into results (row_id, (index,value)) values 
>     ((0,text0), (1,text1), (2,text2), ..., (N,textN));
> {code}
> Each value in the {{values}} clause is a tuple.  The first tuple element is the column
name, the second tuple element is the column value.  This seems to be the most simple/accurate
representation of what happens during a batch insert/mutate.
> Not having this CQL feature forced us to remove the Datastax Java Driver (which we liked)
in favor of Astyanax because Astyanax supports this behavior.  We desire feature/performance
parity between Thrift and CQL3/Datastax Java Driver, so we hope this request improves both
CQL3 and the Driver.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message