From "Dan Hendry" <>
Subject Memtable flush thresholds - what am I missing?
Date Thu, 18 Aug 2011 19:43:37 GMT
I am in the process of trying to tune the memtable flush thresholds for a
particular column family (super column family to be specific) in my
Cassandra 0.8.1 cluster. This CF is reasonably heavily used and getting
flushed roughly every 5-8 minutes which is hardly optimal, particularly
given I have JVM memory to spare at the moment. I am trying to understand
the Cassandra logs but the numbers I am seeing are not making any sense.


The initial memtable settings for this CF were throughput = 70 MB and
operations = 0.7  million. The flush messages I was seeing in the logs
(after a "flushing high-traffic column family" message for this CF) looked

                "Enqueuing flush of Memtable-.... (17203504/600292480
serialized/live bytes, 320432 ops)"


So... uh... ~17 MB serialized, ~600 MB live (whatever that means), and ~320k
ops; the resulting sstables are ~34 MB. This is roughly what every flush
looks like. Two minutes before this particular flush, GC triggering the
StatusLogger shows ops and data for the CF as "122592,230094268" or 122k ops
(sensible) and 230 MB (what???). For at least 2 minutes prior to THAT
message, nothing else happened (flushes, compaction, etc) for any column
family which means that this series of events (flush to gc log entry to
flush) is reasonably isolated from any other activity. 


None of these numbers look even *remotely* close to 70 MB (the
memtable_throughput setting). Anyway, via JMX I went in and changed
throughput to 200 MB and operations to 0.5. This did *absolutely nothing* to
the flush behaviour: still ~17 MB serialized, ~600 MB live ~320k ops, ~34 MB
sstables, and flushes every 5-8 minutes (I waited for a few flushes in case
the change took some time to be applied). I also tried changing the
operations threshold to 0.2 million which DID work so it's not a case of the
settings not being respected.


WTF is going on? What is deciding that a flush is necessary and where are
all of these crazy size discrepancies coming from? Some additional info and
things to point out:

.         I am NOT seeing "the heap is X full, Cassandra will now flush the
two largest memtables warnings" or any other errors/unexpected things

.         The sum of memtable_throughput across all 10 CFs is 770 MB, well
less than the default global memtable threshold of ~4GB on a 12 GB java

.         There are no major compactions running on this machine and no
repairs running across the cluster

.         Hinted handoff is disabled


Any insight would be appreciated.


Dan Hendry

