2011/3/8 Peter Schuller <peter.schuller@infidyne.com>

(1) I cannot stress this one enough: Run with -XX:+PrintGC
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps and collect the output.
(2) Attach to your process with jconsole or some similar tool.
(3) Observe the behavior of the heap over time. Preferably post
screenshots so others can look at them.


I'm not sure that up to the end you has understood, sorry
 
I launch cassandra with follow gc login options (but doesn't mention about this before, because of this document http://www.datastax.com/docs/0.7/troubleshooting/index#nodes-seem-to-freeze-after-some-period-of-time, there is no any mention about gc.log ):

JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime"
JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc.log"
 

And detect that nodes frozen with follow log entires

Total time for which application threads were stopped: 30.0000957 seconds

And so on. Also when i think that nodes are frozen i got UnavailableException and TimeOutException, about 20-30 times (i make few Attempts (300 with 1 sec sleep) before final fail), follow fragment of code illustrate what i do

        for(; $l_i < 300; ++$l_i)
        {
            try
            {
                $client->batch_mutate($mutations, cassandra_ConsistencyLevel::QUORUM);
                $retval = true;

                break;
            }
            catch(cassandra_UnavailableException $e)
            {
                array_push($l_exceptions, get_class($e));
                sleep(1);
            }
            catch(cassandra_TimedOutException $e)
            {
                array_push($l_exceptions, get_class($e));
                sleep(1);
            }
            catch(Exception $e)
            {
                $loger->err(get_class($e).': '.$e->getMessage());
                $loger->err($mutations);

                break;
            };
        };