Uncomment the followings in "cassandra-env.sh".

JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps"
JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure"
JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log"

> Also can you take a heap dump at 2 diff points so that we can compare it?

No, I'm afraid. I ordinary use profiling tools, but am not aware of anything that could respond during this event.


On Sun, Jun 16, 2013 at 4:44 AM, Mohit Anchlia <mohitanchlia@gmail.com> wrote:
Can you paste you gc config? Also can you take a heap dump at 2 diff points so that we can compare it?

Quick thing to do would be to do a histo live at 2 points and compare

Sent from my iPhone

On Jun 15, 2013, at 6:57 AM, Takenori Sato <tsato@cloudian.com> wrote:

INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 1046937600

This says GC for New Generation took so long. And this is usually unlikely. 

The only situation I am aware of is when a fairly large object is created, and which can not be promoted to Old Generation because it requires such a large *contiguous* memory space that is unavailable at the point in time. This is called promotion failure. So it has to wait until concurrent collector collects a large enough space. Thus you experience stop the world. But I think it is not stop the world, but only stop the new world.

For example in case of Cassandra, a large number of in_memory_compaction_limit_in_mb can cause this. This is a limit when a compaction compacts(merges) rows of a key into the latest in memory. So this creates a large byte array up to the number.

You can confirm this by enabling promotion failure GC logging in the future, and by checking compactions executed at that point in time.



On Sat, Jun 15, 2013 at 10:01 AM, Robert Coli <rcoli@eventbrite.com> wrote:
On Fri, Jun 7, 2013 at 12:42 PM, Igor <igor@4friends.od.ua> wrote:
> If you are talking about 1.2.x then I also have memory problems on the idle
> cluster: java memory constantly slow grows up to limit, then spend long time
> for GC. I never seen such behaviour for 1.0.x and 1.1.x, where on idle
> cluster java memory stay on the same value.

If you are not aware of a pre-existing JIRA, I strongly encourage you to :

1) Document your experience of this.
2) Search issues.apache.org for anything that sounds similar.
3) If you are unable to find a JIRA, file one.

Thanks!

=Rob