I was mistaken in saying there weren't writes to the cluster. There's a process that's doing a couple mutations per second.
I just restarted node #3 and found this message on node #1
INFO [HintedHandoff:1] 2011-10-05 12:25:08,173 HintedHandOffManager.java (line 314) Endpoint /xx.xx.xx.180 died before hint delivery, aborting
Could HH have stuck the nodes on the receiving end ? Is there any way to throttle this ?
If it can't be throttled and you confirm HH is a suspect I may simply disable it as I'm running a repair 3 times a week (once per node) so I guess my cluster won't be too out of sync.
Thanks for the quick responses.
Using Hector 0.8.0-1
Hardware is :
- AMD Opteron 4174 6x 2.30+ GHz
- 32 Go DDR3
- 1 Gbps Lossless
I'm running 0.8.6 on all nodes, straight from the debian packages.
I get hinted handoffs from time to time because of flapping, I've planning to increase the phi as per another thread but haven't yet.
Here are the HH per node::
HintedHandoff 0 0 437 0 0
HintedHandoff 0 0 2 0 0Not seeing any iowait from my munin CPU graph, at least not more than the past couple of weeks. There is a little more on the 2nd node because it's also holding a mysql database that gets hit hard.
HintedHandoff 0 0 1798 0 0
Munin iostats graph shows an average 10-20kilo blocks/s read/write.
Nothing special was happening besides a weekly repair on node two starting yesterday at 4am. That one failed with
ERROR [AntiEntropySessions:5] 2011-10-04 04:03:56,676 AbstractCassandraDaemon.java (line 139) Fatal exception in thread Thread[AntiEntropySessions:5,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Problem during repair session manual-repair-0d4125f0-aab7-403b-a083-7c19ef6579b1, endpoint /xx.xx.xx.180 died
Then the next planned repairs failed before starting
INFO [AntiEntropySessions:8] 2011-10-04 04:03:57,137 AntiEntropyService.java (line 658) Could not proceed on repair because a neighbor (/xx.xx.xx.180) is dead: manual-repair-5dc4c221-3a15-4031-9aa8-0931e41816cd failed
Looking at the logs on that node shows no Exception. And I was about to say, "nothing special happening at that time" except that it looks like at 4am, the GC started working hard and got the heap down to 9GB and then it shot straight up to almost 16GB so I guess ParNew couldn't keep up and ConcurrentMarkSweep had to step in and basically hang the server ? It took another 2 minutes until I get the "Heap is 0.75 full" message, I get a lot of StatusLogger messages before that.
So it looks like computing the Merkle tree was very expensive this time... I wonder why ? Anything I can do to handle this ?
INFO [ScheduledTasks:1] 2011-10-04 04:03:12,874 GCInspector.java (line 122) GC for ParNew: 427 ms for 2 collections, 16227154488 used; max is 16838033408
INFO [GossipTasks:1] 2011-10-04 04:04:24,092 Gossiper.java (line 697) InetAddress /xx.xx.xx.97 is now dead.
INFO [ScheduledTasks:1] 2011-10-04 04:04:24,093 GCInspector.java (line 122) GC for ParNew: 26376 ms for 2 collections, 8832125416 used; max is 16838033408
(not GC logs until)
INFO [ScheduledTasks:1] 2011-10-04 04:04:24,251 GCInspector.java (line 122) GC for ConcurrentMarkSweep: 16250 ms for 3 collections, 9076209720 used; max is 16838033408
WARN [ScheduledTasks:1] 2011-10-04 04:06:52,777 GCInspector.java (line 143) Heap is 0.752707974197173 full. You may need to reduce memtable and/or cache sizes. Cassandra will now flush up to the two largest memtables to free up memory. Adjust flush_largest_memtables_at threshold in cassandra.yaml if you don't want Cassandra to do this automatically
2011/10/5 aaron morton <email@example.com>
Lots of hinted handoff can give you mutations…HintedHandoff 0 0 1798 0 01798 is somewhat high. This is the HH tasks on this node though, can you see HH running on other nodes in the cluster? What has been happening on this node ?
FlushWriter 0 0 5714 0 499On 5/10/2011, at 9:52 PM, Yi Yang wrote:Well what client are you using? And can you give a hint to your node hardware?
從我的 BlackBerry® 無線裝置From: Philippe <firstname.lastname@example.org>Date: Wed, 5 Oct 2011 10:33:21 +0200To: user<email@example.com>ReplyTo: firstname.lastname@example.orgSubject: Why is mutation stage increasing ??Hello,
I have my 3-node, RF=3 cluster acting strangely. Can someone shed a light as to what is going on ?
It was stuck for a couple of hours (all clients TimedOut). nodetool tpstats showed huge increasing MutationStages (in the hundreds of thousands).
I restarted one node and it took a while to reply GB of commitlog. I've shutdown all clients that write to the cluster and it's just weird
All nodes are still showing huge MutationStages including the new one and it's either increasing or stable. The pending count is stuck at 32.
Compactionstats shows no compaction on 2 nodes and dozens of Scrub compactions (all at 100%) on the 3rd one. This is a scrub I did last week when I encountered assertion errors.
Netstats shows no streams being exchanged at any node but each on is expecting a few Responses.
Any ideas ?
For example (increased to 567062 while I was writing this email)
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 18372664517 0 0
RequestResponseStage 0 0 10731370183 0 0
MutationStage 32 565879 295492216 0 0
ReadRepairStage 0 0 23654 0 0
ReplicateOnWriteStage 0 0 7733659 0 0
GossipStage 0 0 3502922 0 0
AntiEntropyStage 0 0 1631 0 0
MigrationStage 0 0 0 0 0
MemtablePostFlusher 0 0 5716 0 0
StreamStage 0 0 10 0 0
FlushWriter 0 0 5714 0 499
FILEUTILS-DELETE-POOL 0 0 773 0 0
MiscStage 0 0 1266 0 0
FlushSorter 0 0 0 0 0
AntiEntropySessions 0 0 18 0 0
InternalResponseStage 0 0 0 0 0
HintedHandoff 0 0 1798 0 0
Not sending any streams.
Not receiving any streams.
Pool Name Active Pending Completed
Commands n/a 0 1223769753
Responses n/a 4 1627481305