I ran into a different problem with Row cache recently, sent a message to the list, but it didn't get picked up. I am hoping someone can help me understand the issue. Our data also has rather wide rows, not necessarily in the thousands range, but definitely in the upper-hundreds levels. They are hosted in v1.1.1. I was doing a performance test and enabled off-heap row cache of 1GB for each of our cassandra node (each node has at least 16GB of memory). The test code was requesting a fixed set of 5000 rows from the cluster and ran a few times, but using nodetool info, the row cache hit rate was very low, and a few of the nodes had 0 hits despite the row cache was full.
A Cassandra JVM will generally not function well with with caches and wide rows. Probably the most important thing to understand is Ed's point, that the row cache caches the entire row, not just the slice that was read out. What you've seen is almost exactly the observed behaviour I'd expect with enabling either cache provider over wide rows.
- the on-heap cache will result in evictions that crush the JVM trying to manage garbage. This is also the case so if the rows have an uneven size distribution (as small rows can push out a single large row, large rows push out many small ones, etc).
- the off heap cache will spend a lot of time serializing and deserializing wide rows, such that it can increase latency relative to just reading from disk and leverage the filesystem's cache directly.
The cache resizing behaviour does exist to preserve the server's memory, but it can also cause a death spiral in the on-heap case, because a relatively smaller cache may result in data being evicted more frequently. I've seen cases where sizing up the cache can stabilise a server's memory.
This isn't just a Cassandra thing, it simply happens to be very evident with that system - generally to get an effective benefit from a cache, the data should be contiguously sized and not too large to allow effective cache 'lining'.
On 02/12/12 21:36, Mike wrote:
We recently hit an issue within our Cassandra based application. We
have a relatively new Column Family with some very wide rows (10's of
thousands of columns, or more in some cases). During a periodic
activity, we the range of columns to retrieve various pieces of
information, a segment at a time.
We do these same queries frequently at various stages of the process,
and I thought the application could see a performance benefit from row
caching. We have a small row cache (100MB per node) already enabled,
and I enabled row caching on the new column family.
The results were very negative. When performing range queries with a
limit of 200 results, for a small minority of the rows in the new column
family, performance plummeted. CPU utilization on the Cassandra node
went through the roof, and it started chewing up memory. Some queries
to this column family hung completely.
According to the logs, we started getting frequent GCInspector
messages. Cassandra started flushing the largest mem_tables due to
hitting the "flush_largest_memtables_at" of 75%, and scaling back the
key/row caches. However, to Cassandra's credit, it did not die with an
OutOfMemory error. Its measures to emergency measures to conserve
memory worked, and the cluster stayed up and running. No real errors
showed in the logs, except for Messages getting drop, which I believe
was caused by what was going on with CPU and memory.
Disabling row caching on this new column family has resolved the issue
for now, but, is there something fundamental about row caching that I am
We are running Cassandra 1.1.2 with a 6 node cluster, with a replication
factor of 3.