incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Les Hazlewood <lhazlew...@apache.org>
Subject CQL3 wide row and slow inserts - is there a single insert alternative?
Date Thu, 29 Aug 2013 19:04:33 GMT
Hi all,

We're using a Cassandra table to store search results in a
table/column family that that look like this:

+--------+---------+---------+---------+----
|        | 0       | 1       | 2       | ...
+--------+---------+---------+---------+----
| row_id | text... | text... | text... | ...

The column name is the index # (an integer) of the location in the
overall result set.  The value is the result at that particular index.
 This is great because pagination becomes a simple slice query on the
column name.

Large result sets are split into multiple rows - we're limiting row
size on disk to be around 6 or 7 MB.  For our particular result
entries, this means we can get around 50,000 columns in a single row.

When we create the rows, we have the entire data available in the
application at the time the row insert is necessary.

Using CQL3, an initial implementation had one INSERT statement per
column.  This was killing performance (not to mention the # of
tombstones it created).

Here's the CQL3 table definition:

create table query_results (
    row_id text,
    shard_num int,
    list_index int,
    result text,
    primary key (row_id, shard_num), list_index))
with compact storage

(the row key is row_id + shard_num.  The 'cluster column' is list_index).

I don't want to execute 50,000 INSERT statements for a single row.  We
have all of the data up front - I want to execute a single INSERT.

Is this possible?

We're using the Datastax Java Driver.

Thanks for any help!

Les

Mime
View raw message