Sounds like there is a lot going on. 

I'm going to assume the order you showed the HH stats in is the order of the nodes. I'm guessing node 180 is node 2, but it would be easier if you could identify the nodes and identify the stats for them. 

In no particular order:

* Have a heavily used My SQL box on one node is going to make your life *much* harder than it needs to be. The HH stats looks really suspicious, nodes 1 and 2 running lots of HH and node 2 not. It looks like node 2 is having a hard time keeping up, and i would guess my sql may have something to do with that. Does the cassandra service on node 2 have the same memory allocation as the other nodes ? Were the TP stats below that showed the flush stage blocking from node 2 ? Whats the %util and queue-sz from iostat on node 2 compared to the other nodes? 

* Ignore all the metrics about what is happening on node 2 and get mysql off there.   

* You are running a 16GB heap, that is at the upper range (some would say above the upper range) of what is effective for java / cassandra. 8GB seems to be a popular sweet spot. Work out why you are using so much memory and then try to get the heap size down to 8.

* Try to understand whats happening with memory usage. Do you have lots of CF's ? Lots of row cache ?  Some very big rows ? There are several features in 0.8.6 (see the yaml) that will kick in when memory usage is high, but you need to understand why it's high.

Hope that helps. 

Aaron Morton
Freelance Cassandra Developer

On 5/10/2011, at 11:36 PM, Philippe wrote:

I was mistaken in saying there weren't writes to the cluster. There's a process that's doing a couple mutations per second.

I just restarted node #3 and found this message on node #1
 INFO [HintedHandoff:1] 2011-10-05 12:25:08,173 (line 314) Endpoint /xx.xx.xx.180 died before hint delivery, aborting

Could HH have stuck the nodes on the receiving end ? Is there any way to throttle this ?
If it can't be throttled and you confirm HH is a suspect I may simply disable it as I'm running a repair 3 times a week (once per node) so I guess my cluster won't be too out of sync.


2011/10/5 Philippe <>
Thanks for the quick responses.

Using Hector 0.8.0-1
Hardware is :
  • AMD Opteron 4174 6x 2.30+ GHz
  • 32 Go DDR3
  • 1 Gbps Lossless

I'm running 0.8.6 on all nodes, straight from the debian packages.
I get hinted handoffs from time to time because of flapping, I've planning to increase the phi as per another thread but haven't yet.
Here are the HH per node::
HintedHandoff                     0         0            437         0                 0
HintedHandoff                     0         0              2         0                 0

HintedHandoff                     0         0           1798         0                 0
Not seeing any iowait from my munin CPU graph, at least not more than the past couple of weeks. There is a little more on the 2nd node because it's also holding a mysql database that gets hit hard.
Munin iostats graph shows an average 10-20kilo blocks/s read/write.

Nothing special was happening besides a weekly repair on node two starting yesterday at 4am. That one failed with
ERROR [AntiEntropySessions:5] 2011-10-04 04:03:56,676 (line 139) Fatal exception in thread Thread[AntiEntropySessions:5,5,RMI Runtime]
java.lang.RuntimeException: Problem during repair session manual-repair-0d4125f0-aab7-403b-a083-7c19ef6579b1, endpoint /xx.xx.xx.180 died
Then the next planned repairs failed before starting
INFO [AntiEntropySessions:8] 2011-10-04 04:03:57,137 (line 658) Could not proceed on repair because a neighbor (/xx.xx.xx.180) is dead: manual-repair-5dc4c221-3a15-4031-9aa8-0931e41816cd failed

Looking at the logs on that node shows no Exception. And I was about to say, "nothing special happening at that time" except that it looks like at 4am, the GC started working hard and got the heap down to 9GB and then it shot straight up to almost 16GB so I guess ParNew couldn't keep up and ConcurrentMarkSweep had to step in and basically hang the server ? It took another 2 minutes until I get the "Heap is 0.75 full" message, I get a lot of StatusLogger messages before that.
So it looks like computing the Merkle tree was very expensive this time... I wonder why ? Anything I can do to handle this ?

INFO [ScheduledTasks:1] 2011-10-04 04:03:12,874 (line 122) GC for ParNew: 427 ms for 2 collections, 16227154488 used; max is 16838033408
 INFO [GossipTasks:1] 2011-10-04 04:04:24,092 (line 697) InetAddress /xx.xx.xx.97 is now dead.
 INFO [ScheduledTasks:1] 2011-10-04 04:04:24,093 (line 122) GC for ParNew: 26376 ms for 2 collections, 8832125416 used; max is 16838033408
(not GC logs until)
 INFO [ScheduledTasks:1] 2011-10-04 04:04:24,251 (line 122) GC for ConcurrentMarkSweep: 16250 ms for 3 collections, 9076209720 used; max is 16838033408
 WARN [ScheduledTasks:1] 2011-10-04 04:06:52,777 (line 143) Heap is 0.752707974197173 full.  You may need to reduce memtable and/or cache sizes.  Cassandra will now flush up to the two largest memtables to free up memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically

2011/10/5 aaron morton <>
Lots of hinted handoff can give you mutations…

HintedHandoff                     0         0           1798         0                 0

1798 is somewhat high. This is the HH tasks on this node though, can you see HH running on other nodes in the cluster? What has been happening on this node ? 

HH is throttled to avoid this sort of thing, what version are you on ? 

Also looks like the disk IO could not keep up with the flushing….

FlushWriter                       0         0           5714         0               499
You need to provide some more info on what was happening to nodes before hand. And check the logs on all machines for errors etc. 


Aaron Morton
Freelance Cassandra Developer

On 5/10/2011, at 9:52 PM, Yi Yang wrote:

Well what client are you using? And can you give a hint to your node hardware?

從我的 BlackBerry® 無線裝置

From: Philippe <>
Date: Wed, 5 Oct 2011 10:33:21 +0200
Subject: Why is mutation stage increasing ??

I have my 3-node, RF=3 cluster acting strangely. Can someone shed a light as to what is going on ?
It was stuck for a couple of hours (all clients TimedOut). nodetool tpstats showed huge increasing MutationStages (in the hundreds of thousands).
I restarted one node and it took a while to reply GB of commitlog. I've shutdown all clients that write to the cluster and it's just weird

All nodes are still showing huge MutationStages including the new one and it's either increasing or stable. The pending count is stuck at 32.
Compactionstats shows no compaction on 2 nodes and dozens of Scrub compactions (all at 100%) on the 3rd one. This is a scrub I did last week when I encountered assertion errors.
Netstats shows no streams being exchanged at any node but each on is expecting a few Responses.

Any ideas ?

For example (increased to 567062 while I was writing this email)
Pool Name                    Active   Pending      Completed   Blocked  All time blocked
ReadStage                         0         0    18372664517         0                 0
RequestResponseStage              0         0    10731370183         0                 0
MutationStage                    32    565879      295492216         0                 0
ReadRepairStage                   0         0          23654         0                 0
ReplicateOnWriteStage             0         0        7733659         0                 0
GossipStage                       0         0        3502922         0                 0
AntiEntropyStage                  0         0           1631         0                 0
MigrationStage                    0         0              0         0                 0
MemtablePostFlusher               0         0           5716         0                 0
StreamStage                       0         0             10         0                 0
FlushWriter                       0         0           5714         0               499
FILEUTILS-DELETE-POOL             0         0            773         0                 0
MiscStage                         0         0           1266         0                 0
FlushSorter                       0         0              0         0                 0
AntiEntropySessions               0         0             18         0                 0
InternalResponseStage             0         0              0         0                 0
HintedHandoff                     0         0           1798         0                 0

Mode: Normal
Not sending any streams.
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0     1223769753
Responses                       n/a         4     1627481305