On Tue, Feb 16, 2010 at 2:32 AM, Dr. Martin Grabmüller <Martin.Grabmueller@eleven.de> wrote:
In my tests I have observed that good read latency depends on keeping
the number of data files low.  In my current test setup, I have stored
1.9 TB of data on a single node, which is in 21 data files, and read
latency is between 10 and 60ms (for small reads, larger read of course
take more time).  In earlier stages of my test, I had up to 5000
data files, and read performance was quite bad: my configured 10-second
RPC timeout was regularly encountered.

I believe it is known that crossing sstables is O(NlogN) but I'm unable to find the ticket on this at the moment.  Perhaps Stu Hood will jump in and enlighten me, but in any case I believe https://issues.apache.org/jira/browse/CASSANDRA-674 will eventually solve it.

Keeping write volume low enough that compaction can keep up is one solution, and throwing hardware at the problem is another, if necessary.  Also, the row caching in trunk (soon to be 0.6 we hope) helps greatly for repeat hits.

-Brandon