cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gregory Szorc <>
Subject RE: Nodes frozen in GC
Date Mon, 21 Mar 2011 19:57:40 GMT
> With the large new-gen, you were actually seeing fallbacks to full GC?
> You weren't just still experiencing problems because at 10 gig, the new-gen
> will be so slow to compact to effectively be similar to a full gc in terms of
> affecting latency?

Yes, we were seeing fallbacks to full GC with a large young generation.

Surprisingly, the young generation was still collecting quickly (under 0.25s).

Perhaps that means the young gen was full of fewer, large object?

Doing some experimentation now, with a 250MB young gen, parnew collections are around 0.01-0.02
wall time seconds. With a 2GB young, they are around 0.02-0.07s. Finally, with ~10GB new (9.5GB
young), they are around 0.05-0.15s, with most being <0.1s. If no compactions have occurred,
we occasionally see a 0.2s parnew collection.

109.126: [GC 109.126: [ParNew
Desired survivor size 298254336 bytes, new threshold 3 (max 3)
- age   1:   34897008 bytes,   34897008 total
- age   2:   17763192 bytes,   52660200 total
- age   3:  135342264 bytes,  188002464 total
: 9524607K->204678K(9903232K), 0.0565180 secs] 9527767K->229803K(15146112K), 0.0567610
secs] [Times: user=0.31 sys=0.01, real=0.06 secs]
128.997: [GC 128.998: [ParNew
Desired survivor size 298254336 bytes, new threshold 2 (max 3)
- age   1:  294294360 bytes,  294294360 total
- age   2:   16201608 bytes,  310495968 total
- age   3:   17755920 bytes,  328251888 total
: 9500071K->349162K(9903232K), 0.1070140 secs] 9525196K->400440K(15146112K), 0.1072430
secs] [Times: user=0.60 sys=0.00, real=0.11 secs]

These were taken during compaction of a non-troublesome CF. We are still seeing new generation
allocations in the 1 GB/s range. And, process read I/O reported from /proc/<pid>/io
reports reading about 135MB/s. A 7.5x memory allocations to read I/O ratio does seem pretty
high. Granted, the host was servicing some thrift requests at the time and these obviously
contribute to object allocations. 

Without any compactions, we blow through the ~9.5GB young generation in 50-60s on average,
which means our baseline allocation rate (from servicing thrift requests and other misc background
work) is a (more reasonable) 200MB/s. Assuming there isn't much else going on in Cassandra,
that means that ~80% of the allocated space during compactions is coming from the compaction
and that the memory allocation overhead to process read bytes for compactions is pretty high
(around 6:1).

I understand there will be an overhead for the programming language, but a ratio in the 6:1
range of JVM allocations to I/O reads seems a bit high. Since we are talking about many gigabytes
of memory, I would expect the JVM allocation size to be dominated by the column values. This
leads me to believe that the column value buffers are being excessively copied or the sstables
aren't being read as efficiently as possible. Whatever the root cause, there definitely seems
to be room for improvement. But, I'm not a Java expert and don't know the compaction algorithm
too well, so maybe a ratio of 6:1 is pretty good.

> If there is any suspicion that the above is happening, maybe try decreasing
> in_memory_compaction_limit_in_mb (preparing to see lots of stuff logged
> to console, assuming that's still happening in the 0.6.
> version you're running).

I don't believe an in-memory compaction limit config option exists in 0.6.
> I"m not sure on what you're basing that, but unless I have fatally failed to
> grok something fundamental about the interaction between new-gen and
> old-gen with CMS, object's aren't being allocated *period* while the "young
> generation is being collected" as that is a stop-the-world pause. (This is also
> why I said before that at 10 gig new-gen size, the observed behavior on
> young gen collections may be similar to fallback-to-full-gc cases, but not quite
> since it would be parallel rather than serial)

The grok fail is probably on my end. I couldn't find any documentation to back either of our
claims, so I'll defer to your experience.


View raw message