Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
Sender: scode@scode.org
In-Reply-To: <EB2654ED-D8E1-4319-8A95-B8859404F971@toptarif.de>
References: <AANLkTik0uC_vX=8obMdnVcetia15qQa6AtnHoaaaq2yw@mail.gmail.com>
	<AANLkTi=u5gNFhc9mmm8mD3K0d7oqGY74wAmV5nE4PhcT@mail.gmail.com>
	<43B93B23-D289-481A-8C69-8BCB866ADFF9@toptarif.de>
	<AANLkTim=56YpuvmF=N8LK_fsZMwVk3O-j05E65=FsjNN@mail.gmail.com>
	<EB2654ED-D8E1-4319-8A95-B8859404F971@toptarif.de>
Date: Tue, 14 Dec 2010 19:38:04 +0100
Message-ID: <AANLkTim_hNzhm3rOvhftmJzF-VujX_TaqP7+jBW6YazQ@mail.gmail.com>
Subject: Re: Memory leak with Sun Java 1.6 ?
From: Peter Schuller <peter.schuller@infidyne.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

> I can rule out the first 3. I was running cassandra with default settings=
, i.e. 1GB heap and 256M memtable. So, with 3 memtables+1GB the JVM should =
run with >1.75G (although http://wiki.apache.org/cassandra/MemtableThreshol=
ds considers to increase heap size only gently). Did so. 4GB machine with 2=
GB 64bit-JVM seemed to run stable for quite some time but then also crashed=
 with OOM. Looking at the heap dump it's always the same: all memory nearly=
 always bound in CompactionExecutor (ColumnFamilyStore/ConcurrentSkipListMa=
p, respectively).

Note that the memtables are just data structures in memory and are in
the heap. They are not in addition to the heap. mmap():ed files will
be though.

For debugging purposes you may want to switch Cassandra to "standard"
IO mode instead of mmap. This will have a performance-penalty, but the
virtual/resident sizes won't be polluted with mmap():ed data.

In general, unless you're hitting something particularly strange or
just a bug in Cassandra, you shouldn't be randomly getting OOM:s
unless you are truly using that heap space. What do you mean by
"always bound in compactionexecutor" - by what method did you
determine this to be the case?

Note that memtables are ConcurrentSkipListMap:s, so they are expected
to contain a lot of data (and maybe be significant themselves if lots
of small items are stored).

> This looks like somebody else recently have had a similar problem (->Bott=
om line: more heap - which is okay, but I'd like to understand why):
> http://www.mail-archive.com/user@cassandra.apache.org/msg07516.html

There should be no magic need for CPU. Unless you are severely taxing
it in terms of very high write load or similar, an out-of-the-box
configured cassandra should be needing limited amounts of memory. Did
you run with default memtable thresholds (memtable_throughput_in_mb in
particular)? How many column families do you have? You need memory for
*each* column family, taking into account it's memtable thresholds.

> This is my only CF currently in use (via JMX):
>
> - column_families:
> =C2=A0- column_type: Standard
> =C2=A0 =C2=A0comment: tracking column family
> =C2=A0 =C2=A0compare_with: org.apache.cassandra.db.marshal.UTF8Type
> =C2=A0 =C2=A0default_validation_class: org.apache.cassandra.db.marshal.UT=
F8Type
> =C2=A0 =C2=A0gc_grace_seconds: 864000
> =C2=A0 =C2=A0key_cache_save_period_in_seconds: 3600
> =C2=A0 =C2=A0keys_cached: 200000.0
> =C2=A0 =C2=A0max_compaction_threshold: 32
> =C2=A0 =C2=A0memtable_flush_after_mins: 60
> =C2=A0 =C2=A0min_compaction_threshold: 4
> =C2=A0 =C2=A0name: tracking
> =C2=A0 =C2=A0read_repair_chance: 1.0
> =C2=A0 =C2=A0row_cache_save_period_in_seconds: 0
> =C2=A0 =C2=A0rows_cached: 0.0
> =C2=A0name: test
> =C2=A0replica_placement_strategy: org.apache.cassandra.locator.SimpleStra=
tegy
> =C2=A0replication_factor: 3

This is the only column family being used?

> In addition...actually there is plenty of free memory on the heap (?):
>
> 3605.478: [GC 3605.478: [ParNew
> Desired survivor size 2162688 bytes, new threshold 1 (max 1)
> - age =C2=A0 1: =C2=A0 =C2=A0 416112 bytes, =C2=A0 =C2=A0 416112 total
> : 16887K->553K(38336K), 0.0209550 secs]3605.499: [CMS: 1145267K->447565K(=
2054592K), 1.9143630 secs] 1161938K->447565K(2092928K), [CMS Perm : 18186K-=
>18158K(30472K)], 1.9355340 secs] [Times: user=3D1.95 sys=3D0.00, real=3D1.=
94 secs]
> 3607.414: [Full GC 3607.414: [CMS: 447565K->447453K(2054592K), 1.9694960 =
secs] 447565K->447453K(2092928K), [CMS Perm : 18158K->18025K(30472K)], 1.96=
96450 secs] [Times: user=3D1.92 sys=3D0.00, real=3D1.97 secs]

1.9 seconds to do [CMS: 1145267K->447565K(2054592K) is completely
abnormal if that represents a pause (but not if it's just concurrent
mark/sweep time). I don't quite recognize the format of this log...
I'm suddenly unsure what this log output is coming from. A normal
-XX:+PrintGC and -XX:+PrintGCDetails should yield stuff like:

22.634: [GC 22.634: [ParNew: 19136K->2112K(19136K), 0.0058490 secs]
123155K->108012K(202688K), 0.0059050 secs] [Times: user=3D0.02 sys=3D0.00,
real=3D0.00 secs]
22.660: [GC 22.660: [ParNew: 19136K->2112K(19136K), 0.0063220 secs]
125036K->109893K(202688K), 0.0063910 secs] [Times: user=3D0.02 sys=3D0.00,
real=3D0.01 secs]
22.686: [GC 22.686: [ParNew: 19136K->2112K(19136K), 0.0067150 secs]
126917K->111864K(202688K), 0.0067730 secs] [Times: user=3D0.02 sys=3D0.00,
real=3D0.01 secs]
22.724: [CMS-concurrent-abortable-preclean: 0.041/0.257 secs] [Times:
user=3D0.24 sys=3D0.01, real=3D0.26 secs]
22.724: [GC[YG occupancy: 19037 K (19136 K)]22.724: [Rescan (parallel)
, 0.0085230 secs]22.732: [weak refs processing, 0.0142440 secs] [1
CMS-remark: 109752K(183552K)] 128790K(202688K), 0.0228750 secs]
[Times: user=3D0.03 sys=3D0.00, real=3D0.02 secs]
22.747: [CMS-concurrent-sweep-start]
22.749: [GC 22.749: [ParNew: 19136K->2112K(19136K), 0.0082060 secs]
128813K->113837K(202688K), 0.0082620 secs] [Times: user=3D0.02 sys=3D0.00,
real=3D0.01 secs]
22.772: [GC 22.772: [ParNew: 19136K->1500K(19136K), 0.0064660 secs]
116825K->101174K(202688K), 0.0065400 secs] [Times: user=3D0.02 sys=3D0.00,
real=3D0.01 secs]
22.804: [CMS-concurrent-sweep: 0.041/0.057 secs] [Times: user=3D0.12
sys=3D0.00, real=3D0.05 secs]
22.804: [CMS-concurrent-reset-start]
22.816: [CMS-concurrent-reset: 0.012/0.012 secs] [Times: user=3D0.02
sys=3D0.00, real=3D0.02 secs]
22.910: [GC 22.910: [ParNew: 18524K->2112K(19136K), 0.0024290 secs]
98398K->81987K(202688K), 0.0024860 secs] [Times: user=3D0.01 sys=3D0.00,
real=3D0.00 secs]
22.938: [GC 22.938: [ParNew: 19136K->2112K(19136K), 0.0077940 secs]
99011K->83980K(202688K), 0.0078510 secs] [Times: user=3D0.02 sys=3D0.00,
real=3D0.01 secs]
22.965: [GC 22.965: [ParNew: 19136K->2112K(19136K), 0.0063160 secs]
101004K->85959K(202688K), 0.0063760 secs] [Times: user=3D0.02 sys=3D0.00,
real=3D0.01 secs]

Where you can clearly see what constitute stop times and what are just
timing the CMS activity in the background. Yours seems to be somewhere
in between. I vaguely recognize it but I don't remember what produces
that...

Anyways it certainly looks like you have reasonable heap usage going
on as expected.

I'm concerned since you reported that independent JVM:s like nodetool
suffered OOM:s as well; that really indicates there is a real problem
of some kind.

--=20
/ Peter Schuller