incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: Row caching + Wide row column family == almost crashed?
Date Tue, 04 Dec 2012 09:47:59 GMT
I responded on your other thread. 


Aaron Morton
Freelance Cassandra Developer
New Zealand


On 4/12/2012, at 5:31 PM, Yiming Sun <> wrote:

> I ran into a different problem with Row cache recently, sent a message to the list, but
it didn't get picked up.  I am hoping someone can help me understand the issue.  Our data
also has rather wide rows, not necessarily in the thousands range, but definitely in the upper-hundreds
levels.   They are hosted in v1.1.1.   I was doing a performance test and enabled off-heap
row cache of 1GB for each of our cassandra node (each node has at least 16GB of memory). 
 The test code was requesting a fixed set of 5000 rows from the cluster and ran a few times,
but using nodetool info,  the row cache hit rate was very low, and a few of the nodes had
0 hits despite the row cache was full.
> so what i was trying to understand is how the row cache can be full but with 0 hits?
> On Mon, Dec 3, 2012 at 6:55 PM, Bill de hÓra <> wrote:
> A Cassandra JVM will generally not function well with with caches and wide rows. Probably
the most important thing to understand is Ed's point, that the row cache caches the entire
row, not just the slice that was read out. What you've seen is almost exactly the observed
behaviour I'd expect with enabling either cache provider over wide rows.
>  - the on-heap cache will result in evictions that crush the JVM trying to manage garbage.
This is also the case so if the rows have an uneven size distribution (as small rows can push
out a single large row, large rows push out many small ones, etc).
>  - the off heap cache will spend a lot of time serializing and deserializing wide rows,
such that it can increase latency relative to just reading from disk and leverage the filesystem's
cache directly.
> The cache resizing behaviour does exist to preserve the server's memory, but it can also
cause a death spiral in the on-heap case, because a relatively smaller cache may result in
data being evicted more frequently.  I've seen cases where sizing up the cache can stabilise
a server's memory.
> This isn't just a Cassandra thing, it simply happens to be very evident with that system
- generally to get an effective benefit from a cache, the data should be contiguously sized
and not too large to allow effective cache 'lining'.
> Bill
> On 02/12/12 21:36, Mike wrote:
> Hello,
> We recently hit an issue within our Cassandra based application.  We
> have a relatively new Column Family with some very wide rows (10's of
> thousands of columns, or more in some cases).  During a periodic
> activity, we the range of columns to retrieve various pieces of
> information, a segment at a time.
> We do these same queries frequently at various stages of the process,
> and I thought the application could see a performance benefit from row
> caching.  We have a small row cache (100MB per node) already enabled,
> and I enabled row caching on the new column family.
> The results were very negative.  When performing range queries with a
> limit of 200 results, for a small minority of the rows in the new column
> family, performance plummeted.  CPU utilization on the Cassandra node
> went through the roof, and it started chewing up memory.  Some queries
> to this column family hung completely.
> According to the logs, we started getting frequent GCInspector
> messages.  Cassandra started flushing the largest mem_tables due to
> hitting the "flush_largest_memtables_at" of 75%, and scaling back the
> key/row caches.  However, to Cassandra's credit, it did not die with an
> OutOfMemory error.  Its measures to emergency measures to conserve
> memory worked, and the cluster stayed up and running.  No real errors
> showed in the logs, except for Messages getting drop, which I believe
> was caused by what was going on with CPU and memory.
> Disabling row caching on this new column family has resolved the issue
> for now, but, is there something fundamental about row caching that I am
> missing?
> We are running Cassandra 1.1.2 with a 6 node cluster, with a replication
> factor of 3.
> Thanks,
> -Mike

View raw message