Hi everyone,

We're trying to implement a virtual datastore for our users where they can set up "tables" and "indexes" to store objects and have them indexed on arbitrary properties. And we did a test implementation for Cassandra in the following way:

Objects are stored in one columnfamily, each key is made up of tableid + "object key", and each row has one column where the value is the serialized object. This part is super-simple, we're just using Cassandra as a key-value-store, and this part performs really well.

The indexes are a bit tricker, but basically for each index and each object that is stored, we compute a fixed-length bytearray based on the object that make up the indexvalue. We then store these bytearray indexvalues in another columnfamily, with the indexid as row key, the indexvalue as the column name, and the object key as the column value.

The idea is then that to perform a range query on an "index" in this virtual datastore, we do a get_slice to get the range of indexvalues and their corresponding object keys, and we can then multi_get the actual objects from the other column family.

Since these virtual tables and indexes will be created by our users the whole system has to be very dynamic, and we can't make any assumptions about the actual objects they will store and the distribution of these. We do know that it must be able to scale well, and this is what attracted us to Cassandra in the first place. We do however have some performance targets we want to hit, we have one use-case where there will be about 30 million records in a "table", and knowing that it can go up to 100 million records would be nice. As for speed, would like to get thousands of writes and range reads per second.

Given these requirements and our design, we will then have rows in Cassandra with millions of columns, from which we want to fetch large column slices. We set it all up on a single developer machine (MacPro, QuadCore 2.66ghz) running Windows, and we used the thrift compiler to generate a C# client library. We tested just the "index" part of our design, and these are the numbers we got:
inserts (15 threads, batches of 10): 4000/second
get_slices (10 threads, random range sizes, count 1000): 50/second at start, dies at about 6 million columns inserted. (OutOfMemoryException)
get_slices (10 threads, random range sizes, count 10): 200/s at start, slows down the more columns there are.

When we saw that the above results were bad, we tried a different approach storing the indexvalues in the key instead, using the OrderPreservingPartitioner and using get_range_slice to get ranges of rows, but we got even worse results:
inserts (15 threads, in batches of 10): 4000/second
get_range_slice (10 threads, random key ranges, count 1000): 20/second at start, 5/second with 30 million rows

Finally, we did a similar test using MySQL instead and then we got these numbers:
inserts (15 threads, in batches of 10): 4000/second
select (10 threads, limit 1000): 500/second

So for us, the MySQL version delivers the speed that we want, but none of the scaling that Cassandra gives us. We set up our columnfamilies like this:

<ColumnFamily CompareWith="BytesType" Name="Objects" RowsCached="0" KeysCached="0"/>
<ColumnFamily CompareWith="BytesType" Name="Indexes" RowsCached="0" KeysCached="0"/>

And we now have these questions:
a) Is there a better way of structuring our data and building the virtual indexes?
b) Are our Cassandra numbers too low? Or is this the expected performance?
c) Did we miss to change some important setting (in the conf xml or java config) since our rows are this large?
d) Can we avoid hitting the Out of memory exception?