Thanks for the answer, I wanted to report on the improvements I got because someone else is bound to run into the same questions...
> C) I want to access a key that is at the 50th position in that table,
> Cassandra will seek position 0 and then do a sequential read of the file
> from there until it finds the key, right ?
Sequential read of the index file, not the data file.
and then it will seek directly to the right position in the data file ?
> J) I've considered writing a partitioner that will chunk the rows together
> so that queries for "close" rows go to the same replica on the ring. Since
> the rows have close keys, they will be close together in the file and this
> will increase OS cache efficiency.
Sounds like ByteOrderedPartitioner to me.
I indeed ended up using just that
> What do you think ?
I think you should strongly consider denormalizing so that you can
read ranges from a single row instead.
Yes, that's what I did : I took a hard look at the data and the acces pattern and sliced away at everything I could.
Given that I am storing data in a quad tree and that I have strong locality in my read-pattern, I ended up using the morton (z-order) code as the key and using super-columns to only get the column groups I'm interested in.
I gave some thought on how to balance the tree because I have 10 different levels of data in the quadtree and I am doing tricks with shifts to reuse the same prefixes in the keys.
What I think is worth noting for others on the mailing list is that doing this resulted in a x50 to x100 increase in read performance and my IO is now down to virtually nothing (I can basically see the OS load up the pages in its cache).
I also found out that one big multiget is more efficient that a couple range queries in my case.
- instead of a steady rate of 280/350MB/s of disk reads I get 100MB/s every so often
- instead of seeing my cluster melt down at 3 concurrent clients, it's now speeding along just fine at 50 concurrent clients