cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: Cassandra flush spin?
Date Sun, 10 Feb 2013 20:36:57 GMT
Sounds like flushing due to memory consumption. 

The flush log messages include the number of ops, so you can see if this node was processing
more mutations that the others. Try to see if there was more (serialised) data being written
or more operations being processed. 

Also just for fun check the JVM and yaml settings are as expected. 


Aaron Morton
Freelance Cassandra Developer
New Zealand


On 10/02/2013, at 6:29 AM, Mike <> wrote:

> Hello,
> We just hit a very odd issue in our Cassandra cluster.  We are running Cassandra 1.1.2
in a 6 node cluster.  We use a replication factor of 3, and all operations utilize LOCAL_QUORUM
> We noticed a large performance hit in our application's maintenance activities and I've
been investigating.  I discovered a node in the cluster that was flushing a memtable like
crazy.  It was flushing every 2->3 minutes, and has been apparently doing this for days.
Typically, during this time of day, a flush would happen every 30 minutes or so.
> "cat /var/log/cassandra/system.log | grep \"flushing high-traffic column family
CFS(Keyspace='open', ColumnFamily='msgs')\" | grep 02-08 | wc -l"
> [1] 18:41:04 [SUCCESS] db-1c-1
> 59
> [2] 18:41:05 [SUCCESS] db-1c-2
> 48
> [3] 18:41:05 [SUCCESS] db-1a-1
> 1206
> [4] 18:41:05 [SUCCESS] db-1d-2
> 54
> [5] 18:41:05 [SUCCESS] db-1a-2
> 56
> [6] 18:41:05 [SUCCESS] db-1d-1
> 52
> I restarted the database node, and, at least for now, the problem appears to have stopped.
> There are a number of things that don't make sense here.  We use a replication factor
of 3, so if this was being caused by our application, I would have expected 3 nodes in the
cluster to have issues.  Also, I would have expected the issue to continue once the node restarted.
> Another information point of interest, and I'm wondering if its exposed a bug, was this
node was recently converted to use ephemeral storage on EC2, and was restored from a snapshot.
 After the restore, a nodetool repair was run.  However, repair was going to run into some
heavy activity for our application, and we canceled that validation compaction (2 of the 3
anti-entropy sessions had completed).  The spin appears to have started at the start of the
second session.
> Any hints?
> -Mike

View raw message