Hi,

 

Had the same problem this morning, seems related to the leap second bug.

Rebooting the nodes fixed it for me, but there seems to be a fix also without rebooting the server.

 

Kind regards,

Pieter

 

From: feedly team [mailto:feedlydev@gmail.com]
Sent: maandag 2 juli 2012 17:09
To: user@cassandra.apache.org
Subject: frequent node up/downs

 

Hello,

   I recently set up a 2 node cassandra cluster on dedicated hardware. In the logs there have been a lot of "InetAddress xxx is now dead' or UP messages. Comparing the log messages between the 2 nodes, they seem to coincide with extremely long ParNew collections. I have seem some of up to 50 seconds. The installation is pretty vanilla, I didn't change any settings and the machines don't seem particularly busy - cassandra is the only thing running on the machine with an 8GB heap. The machine has 64GB of RAM and CPU/IO usage looks pretty light. I do see a lot of 'Heap is xxx full. You may need to reduce memtable and/or cache sizes' messages. Would this help with the long ParNew collections? That message seems to be triggered on a full collection.