Thank you, it's help.

because I have about 150G data in each node, so I setup the Heap to 8 giga, just want to make cassandra have enought space to cache key index.
 
I think reduce the heap size is valuable to try. Try to split one cassandra instance to 2 sub node, contains in one physical server, each instance have 4 giga heap space, each instance handle about 75 giga data, I will try it, and feed back if it was postive.

BTW: Somebody in my team told me, that if the cassandra managed data was too huge( >15x than heap space) , will cause performance issues, is this true?


Regards
Santal

2010/2/21 Tatu Saloranta <tsaloranta@gmail.com>
On Fri, Feb 19, 2010 at 7:40 PM, Santal Li <santal.li@gmail.com> wrote:
> I meet almost same thing as you. When I do some benchmarks write test, some
> times one Cassandra will freeze and other node will consider it was shutdown
> and up after 30+ second. I am using 5 node, each node 8G mem for java heap.
>
> From my investigate, it was caused by GC thread, because I start the
> JConsole and monitor with the memory heap usage, each time when the GC
> happend, heap usage will drop down from 6G to 1G, and check the casandra
> log, I found the freeze happend at exactly same times.

With such a big heap, old generation GCs can definitely take a while.
With just 1.5 gig heap, and with somewhat efficient parallel
collection (on multi-core machine), we had trouble keeping collections
below 5 seconds. But this depends a lot on survival ratio -- less
garbage there is (and more live objects), slower things are. And
relationship is super-linear too, so processing 6 gig (or whatever
part of that is old generation space) can take a long time.

It is certainly worth keeping in mind that more memory generally means
longer gc collection time.

But Jonathan is probably right in that this alone would not cause
appearance of freeze -- rather, overload of GC blocking processing AND
accumulation of new requests sounds more plausible.
It is still good to consider both parts of the puzzle; preventing
overflow that can turn bad situation into catastrophe, and trying to
reduce impact of GC.

> So I think when using huge memory(>2G), maybe need using some different GC
> stratege other than the default one provide by Cassandra lunch script.
> Dose't anyone meet this situation, can you please provide some guide?

There are many ways to change GC settings, and specifically trying to
reduce impact of old gen collections (young generation ones are less
often problematic, although they can be tuned as well).
Often there is a trade-off between frequency and impact of GC: to
simplify, less often you configure it to occur (like increase heap),
more impact it usually has when it does occur.
Concurrent collectors (like traditional CMS) are good for steady
state, and can keep oldgen GC from occuring maybe for hours (doing
incremental concurrent "partial" collections). But can also lead to
GC-from-hell when it must do full GC (since it's stop-the-world) kind.

There is tons of information on how to deal with GC settings, but
unfortunately it is bit of black arts and very dependant on your
specific use case. There being dozens (more than a hundred I think)
different switches makes it actually trickier, since you also need to
learn which ones matter, and in what combinations.

One somewhat counter-intuitive suggestion is to reduce size of heap at
least with respect to caching. So mostly try to just keep live working
set in memory, and not do caching inside Java process. Operating
systems are pretty good at caching disk pages; and if storage engine
is out of process (like native BDB), this can significantly reduce GC.
In-process caches can be really bad for GC activity, because their
contents are potentially long-living, yet relatively transient (that
is, neither mostly live, nor mostly garbage, making GC optimizer try
in vain to compact things).
But once again, this may or may not help, and needs to be experimented with.

Not sure if above helps, but I hope it gives at least some ideas,

-+ Tatu +-