Does it actually OOM eventually? There will be a certain amount of object allocation for reads (or anything) which will see the heap creep up until a GC, but at ~500mb or so of a 8gb heap there is little reason for the JVM to do it so it probably just ignores it to save processing. Even the young gen wont require a collection at this size.
Which version of Cassandra are you running? Previous to 1.2 a lot of metadata about the sstables took considerable heap which could cause additional memory utilization.
I have about 8 heap dumps that I have been looking at. I have been trying to isolate
as to why I have be dumping heap, I've started by removing the apps that write to
cassandra and eliminating work that would entail. I am left with just the apps that
are reading the data and from the heap dumps it looks like Cassandra Column methods
being called, because there are so many objects, it is difficult to ascertain exactly what
the problem may be. That prompted my query, trying to quickly determine if Cassandra
holds objects that have been used for reading, and if so, why, and more importantly if
something can be done.
To get an accurate picture you should force a full GC on each node, the heap utilization can be misleading since there can be a lot of things in the heap with no strong references.
There is a number of factors that can lead to this. For a true comparison I would recommend using jconsole and call dumpHeap on com.sun.management:type=HotSpotDiagnostic with the 2nd param true (force GC). Then open the heap dump up in a tool like yourkit and you will get a better comparison and also it will tell you what it is that’s taking the space.
I currently am looking at a 4 node cluster and I have currently stopped all writing to
Cassandra, with the reads continuing. I'm trying to understand the utilization
of memory within the JVM. nodetool info on each of the nodes shows them all
growing in footprint, 2 of the three at a greater rate. On the restart of Cassandra
each were at about 100MB, after 2 days, each of the following are at:
Heap Memory (MB) : 798.41 / 3052.00
Heap Memory (MB) : 370.44 / 3052.00
Heap Memory (MB) : 549.73 / 3052.00
Heap Memory (MB) : 481.89 / 3052.00
Address Rack Status State Load Owns Token
x 1d Up Normal 4.38 GB 25.00% 0
x 1d Up Normal 4.17 GB 25.00% 42535295865117307932921825928971026432
x 1d Up Normal 4.19 GB 25.00% 85070591730234615865843651857942052864
x 1d Up Normal 4.14 GB 25.00% 127605887595351923798765477786913079296
What I'm not sure of is what the growth is different between each ? and why
that growth is being created by activity that is read only.
Is Cassandra caching and holding the read data ?
I currently have caching turned off for the key/row. Also as part of the info command
Key Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 14400 save period in seconds
Row Cache : size 0 (bytes), capacity 0 (bytes), 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds