cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Fong <michael.f...@ruckuswireless.com>
Subject C* 1.2.x vs Gossip marking DOWN/UP
Date Wed, 13 Apr 2016 08:58:40 GMT
Hi, all


We have been a Cassandra 4-node cluster (C* 1.2.x) where a node marked all the other 3 nodes
DOWN, and came back UP a few seconds later. There was a compaction that kicked in a minute
before, roughly 10~MB in size, followed by marking all the other nodes DOWN later. In the
other words, in the system.log we see
00:00:00 Compacting ....
00:00:03 Compacted 8 sstables ... 10~ megabytes
00:01:06 InetAddress /x.x.x.4 is now DOWN
00:01:06 InetAddress /x.x.x.3 is now DOWN
00:01:06 InetAddress /x.x.x.1 is now DOWN

There was no significant GC activities in gc.log. We have heard that busy compaction activities
would cause this behavior, but we cannot reason why this could happen logically. How come
a compaction operation would stop the Gossip thread to perform heartbeat check? Has anyone
experienced this kind of behavior before?

Thanks in advanced!

Sincerely,

Michael Fong

Mime
View raw message