incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carl Bruecken <>
Subject improving read performance
Date Mon, 20 Sep 2010 14:48:48 GMT

The cassandra FAQ answers the question as to why reads are slower than 
writes as follows:

This drawback is unfortunate for systems that use time-based row 
keys.    In such systems, row data will generally not be fragmented very 
much, if at all, but reads suffer because the assumption is that all 
data is fragmented.    Even further, in a real-time system where reads 
occur quickly after writes, if the data is in memory, the sstables are 
still checked.

I've been working on a patch that I hope will make read performance 
comparable to write performance, if not faster in the cases where no 
disk access is involved for the reads.   The assumption is that for a 
time-based row key the data will be fragmented only at the edges of 
memtable flushes.   Therefore, only 2 reads need occur either to the 
current memtable in memory and the newest sstable, or 2 adjacent 
sstables.    In the case of real-time reads, I've further split the 
single memtable into 2 memtables so that the 2 required reads will 
happen against 2 memtables.   The read algorithm is to search for the 
first fragment until it is found and then only read from the adjacent 
memtable or sstable.

I haven't uncovered any showstoppers with this approach, yet.    I'm 
hoping that by posting this message, someone might alert me if they 
detect any flaws with this approach.

View raw message