cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yiming Sun <>
Subject Re: Row caching + Wide row column family == almost crashed?
Date Tue, 04 Dec 2012 15:24:55 GMT
Yup, got it.  Thanks Aaron.

On Tue, Dec 4, 2012 at 4:47 AM, aaron morton <>wrote:

> I responded on your other thread.
> Cheers
>    -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> @aaronmorton
> On 4/12/2012, at 5:31 PM, Yiming Sun <> wrote:
> I ran into a different problem with Row cache recently, sent a message to
> the list, but it didn't get picked up.  I am hoping someone can help me
> understand the issue.  Our data also has rather wide rows, not necessarily
> in the thousands range, but definitely in the upper-hundreds levels.   They
> are hosted in v1.1.1.   I was doing a performance test and enabled off-heap
> row cache of 1GB for each of our cassandra node (each node has at least
> 16GB of memory).   The test code was requesting a fixed set of 5000 rows
> from the cluster and ran a few times, but using nodetool info,  the row
> cache hit rate was very low, and a few of the nodes had 0 hits despite the
> row cache was full.
> so what i was trying to understand is how the row cache can be full but
> with 0 hits?
> On Mon, Dec 3, 2012 at 6:55 PM, Bill de hÓra <> wrote:
>> A Cassandra JVM will generally not function well with with caches and
>> wide rows. Probably the most important thing to understand is Ed's point,
>> that the row cache caches the entire row, not just the slice that was read
>> out. What you've seen is almost exactly the observed behaviour I'd expect
>> with enabling either cache provider over wide rows.
>>  - the on-heap cache will result in evictions that crush the JVM trying
>> to manage garbage. This is also the case so if the rows have an uneven size
>> distribution (as small rows can push out a single large row, large rows
>> push out many small ones, etc).
>>  - the off heap cache will spend a lot of time serializing and
>> deserializing wide rows, such that it can increase latency relative to just
>> reading from disk and leverage the filesystem's cache directly.
>> The cache resizing behaviour does exist to preserve the server's memory,
>> but it can also cause a death spiral in the on-heap case, because a
>> relatively smaller cache may result in data being evicted more frequently.
>>  I've seen cases where sizing up the cache can stabilise a server's memory.
>> This isn't just a Cassandra thing, it simply happens to be very evident
>> with that system - generally to get an effective benefit from a cache, the
>> data should be contiguously sized and not too large to allow effective
>> cache 'lining'.
>> Bill
>> On 02/12/12 21:36, Mike wrote:
>>> Hello,
>>> We recently hit an issue within our Cassandra based application.  We
>>> have a relatively new Column Family with some very wide rows (10's of
>>> thousands of columns, or more in some cases).  During a periodic
>>> activity, we the range of columns to retrieve various pieces of
>>> information, a segment at a time.
>>> We do these same queries frequently at various stages of the process,
>>> and I thought the application could see a performance benefit from row
>>> caching.  We have a small row cache (100MB per node) already enabled,
>>> and I enabled row caching on the new column family.
>>> The results were very negative.  When performing range queries with a
>>> limit of 200 results, for a small minority of the rows in the new column
>>> family, performance plummeted.  CPU utilization on the Cassandra node
>>> went through the roof, and it started chewing up memory.  Some queries
>>> to this column family hung completely.
>>> According to the logs, we started getting frequent GCInspector
>>> messages.  Cassandra started flushing the largest mem_tables due to
>>> hitting the "flush_largest_memtables_at" of 75%, and scaling back the
>>> key/row caches.  However, to Cassandra's credit, it did not die with an
>>> OutOfMemory error.  Its measures to emergency measures to conserve
>>> memory worked, and the cluster stayed up and running.  No real errors
>>> showed in the logs, except for Messages getting drop, which I believe
>>> was caused by what was going on with CPU and memory.
>>> Disabling row caching on this new column family has resolved the issue
>>> for now, but, is there something fundamental about row caching that I am
>>> missing?
>>> We are running Cassandra 1.1.2 with a 6 node cluster, with a replication
>>> factor of 3.
>>> Thanks,
>>> -Mike

View raw message