incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henrik Schröder <>
Subject Re: Range scan performance in 0.6.0 beta2
Date Fri, 26 Mar 2010 12:40:48 GMT
> So all the values for an entire index will be in one row?  That
> doesn't sound good.
> You really want to put each index [and each table] in its own CF, but
> until we can do that dynamically (0.7) you could at least make the
> index row keys a tuple of (indexid, indexvalue) and the column names
> in each row the object keys (empty column values).
> This works pretty well for a lot of users, including Digg.

We tested your suggestions like this:
We're using the OrderPreservingPartitioner.
We set the keycache and rowcache to 40%.
We're using the same machine as before, but we switched to a 64-bit JVM and
gave it 5GB of memory
For each indexvalue we insert a row where the key is indexid + ":" +
indexvalue encoded as hex string, and the row contains only one column,
where the name is the object key encoded as a bytearray, and the value is
When reading, we do a get_range_slice with an empty slice_range (start and
finish are 0-length byte-arrays), and randomly generated start_key and
finish_key where we know they both have been inserted, and finally a
row_count of 1000.

These are the numbers we got this time:
inserts (15 threads, batches of 10): 4000/second
get_range_slices (10 threads, row_count 1000): 50/seconds at start, down to
10/second at 250k inserts.

These numbers are slightly better than our previous OPP tries, but nothing
significant. For what it's worth, if we're only doing writes, the machine
bottlenecks on disk I/O as expected, but whenever we do reads, it
bottlenecks on CPU usage instead. Is this expected?

Also, how would dynamic column families help us? In our tests, we only
tested a single "index", so even if we had one column family per "index", we
would still only write to one of them and then get the exact same results as
above, right?

We're really grateful for any help with both how to tune Cassandra and how
to design our data model. The designs we've tested so far is the best we
could come up with ourselves, all we really need is a way to store groups of
mappings of indexvalue->objectkey, and be able to get a range of objectkeys
back given a group and a start and stop indexvalue.


View raw message