We're running a small Cassandra cluster (1.1.4) with two nodes and
serving data to our Web and Java application. After up-gradation of
Cassandra from 1.0.8 to 1.1.4, we're starting to see some weird
If we run 'ring' command from second node, its show that failed to
connect 7199 of node 1.
/opt/apache-cassandra-1.1.4/bin/nodetool -h XX.XX.XX.01 ring
Failed to connect to 'XX.XX.XX.01:7199': Connection refused
We're using Network Monitoring System and Monit to monitor the
servers, and in NMS the average CPU usage is around increased upto
500%, on our quad-core Xeon servers with 16 GB RAM. But occasionally
through Monit we can see that the 1-min load average goes above 7.
Is this common? Does this happen to everyone else? And why the
spikiness in load? We can't find anything in the cassandra logs
indicating that something's up (such as a slow GC or compaction),
and there's no corresponding traffic spike in the application
either. Should we just add more nodes if any single one gets CPU
Another explanation could also be that we've configured it wrong.
We're running pretty much default config and each node has 16G of
A single keyspace with 15 to 20 column families, RF=2, and we have
260 GB of actual data. Please find below top and I/O stats for