incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Schuller <peter.schul...@infidyne.com>
Subject Re: Nodes getting slowed down after a few days of smooth operation
Date Mon, 11 Oct 2010 17:42:18 GMT
> 170141183460469231731687303715884105727
> 192.168.252.88Up         10.07 GB

Firstly, I second the point raised about the row cache size (very
frequent concurrent GC:s is definitely an indicator that the JVM heap
size is too small, and the row cache seems like a likely contender -
especially given that you say it builds up over days). Note though
that you have to look at the GCInspector's output with respect to the
concurrent mark/sweep GC phases to judge the live set in your heap,
rather than system memory. Attaching with jconsole or visualvm to the
JVM will also give you a pretty good view of what's going on. Look for
the heap usage as it appears after one of the "major" dips in the
graph (not the regular sawtooth dips, which are young generation
collections and won't help indicate actual live set).

That said, with respect to caching effects: Your total data size seems
to be about in the same ballpark as memory. Your maximum heap size is
6 gig; on a 16 gig machine, taking into account varous overhead, maybe
you've got something like 8 GB for buffer cache? It doesn't sound
strange at all that there would be a significant difference between a
32 GB machine and a 16 GB machine given your ~ 10 GB data size given
that buffer cache size goes from "slightly below data size" to "almost
three times data size". Especially when major or almost-major
compactions are triggered; on the small machine you would expect to
evict everything from cache during a compaction (except that touched
*during* the compaction) while on the larger machine the newly written
sstables effectively fit the cache too.

But note that these are two pretty different conditions; the first is
about making sure your JVM heap size is appropriate. The second can be
tested for by observing I/O load (iostat -x -k 1) and correlating with
compactions. So e.g., what's the average utilization and queue size in
iostat just before a compaction vs. just after it? That difference
should be due to cache eviction (assuming you're not servicing a
built-up backlog). There is also the impact of compaction itself, as
it is happening, and the I/O it generates. In general, the higher your
disk load is prior to compaction, the less margin there is to deal
with compaction happening concurrently.

In general, whether or not you are willing to make the assumption that
activley used data fits in RAM will severely affect the hardware
requirements for serving your load.

-- 
/ Peter Schuller

Mime
View raw message