Hi folks,

 

I have been loading a 6-server Cassandra cluster with 1KB records. After a few million inserts, the insert rate drops dramatically. After investigation, one of the Cassandra servers seems to be in a bad state, using 100% of one core on an 8-core machine, and 0% on the other cores. Inserts to this box have completely stopped, and the inserts to the other boxes have slowed way down (more than a factor of 10 slower.) A “kill” or “kill -3” to the bad java process does nothing; I have to use “kill -9” to stop it. Has anybody experienced anything like this?

 

Additional info:

 

The servers are 8 core, 8GB servers. I am running 64 bit java 1.6, and here are the JVM options:

 

# Arguments to pass to the JVM

JVM_OPTS=" \

        -ea \

        -Xdebug \

        -Xrunjdwp:transport=dt_socket,server=y,address=8888,suspend=n \

        -Xms128M \

        -Xmx6G \

        -XX:SurvivorRatio=8 \

        -XX:TargetSurvivorRatio=90 \

        -XX:+AggressiveOpts \

        -XX:+UseParNewGC \

        -XX:+UseConcMarkSweepGC \

        -XX:CMSInitiatingOccupancyFraction=1 \

        -XX:+CMSParallelRemarkEnabled \

        -XX:+HeapDumpOnOutOfMemoryError \

        -Dcom.sun.management.jmxremote.port=8080 \

        -Dcom.sun.management.jmxremote.ssl=false \

        -Dcom.sun.management.jmxremote.authenticate=false"

 

(standard options from the Cassandra distribution, except for the 6GB of heap space.)

 

Replication factor is 1 (this is just a test, not a production setup) and memtable size is set to 1GB.

 

Thanks…

 

brian