I meet almost same thing as you. When I do some benchmarks write test, some times one Cassandra will freeze and other node will consider it was shutdown and up after 30+ second. I am using 5 node, each node 8G mem for java heap.

From my investigate, it was caused by GC thread, because I start the JConsole and monitor with the memory heap usage, each time when the GC happend, heap usage will drop down from 6G to 1G, and check the casandra log, I found the freeze happend at exactly same times.

So I think when using huge memory(>2G), maybe need using some different GC stratege other than the default one provide by Cassandra lunch script. Dose't anyone meet this situation, can you please provide some guide?


Thanks
-Santal

2010/2/17 Tatu Saloranta <tsaloranta@gmail.com>
On Tue, Feb 16, 2010 at 6:25 AM, Boris Shulman <shulmanb@gmail.com> wrote:
> Hello, I'm running some benchmarks on 2 cassandra nodes each running
> on 8 cores machine with 16G RAM, 10G for Java heap. I've noticed that
> during benchmarks with numerous writes cassandra just freeze for
> several minutes (in those benchmarks I'm writing batches of 10 columns
> with 1K data each for every key in a single CF). Usually after
> performing 50K writes I'm getting a TimeOutException and cassandra
> just freezes. What configuration changes can I make in order to
> prevent this? Is it possible that my setup just can't handle the load?
> How can I calculate the number of casandra nodes for a desired load?

One thing that can cause seeming lockups is garbage collector. So
enabling GC debug output would be heplful, to see GC activity. Some
collector (CMS specifically) can stop the system for very long time,
up to minutes. This is not necessarily the root cause, but is easy to
rule out.
Beyond this, getting a stack trace during lockup would make sense.
That can pinpoint what threads are doing, or what they are blocked on
in case there is a deadlock or heavy contention on some shared
resource.

-+ Tatu +-