Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
MIME-Version: 1.0
Sender: scode@scode.org
In-Reply-To: <AANLkTik0uC_vX=8obMdnVcetia15qQa6AtnHoaaaq2yw@mail.gmail.com>
References: <AANLkTik0uC_vX=8obMdnVcetia15qQa6AtnHoaaaq2yw@mail.gmail.com>
Date: Fri, 10 Dec 2010 19:37:06 +0100
Message-ID: <AANLkTi=u5gNFhc9mmm8mD3K0d7oqGY74wAmV5nE4PhcT@mail.gmail.com>
Subject: Re: Memory leak with Sun Java 1.6 ?
From: Peter Schuller <peter.schuller@infidyne.com>
To: user@cassandra.apache.org
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

> =C2=A0Over the past month or so, it looks like memory has slowly
> =C2=A0been exhausted. =C2=A0Both nodetool drain and jmap can't run, and
> =C2=A0produce this error:
>
> =C2=A0 =C2=A0 Error occurred during initialization of VM
> =C2=A0 =C2=A0 Could not reserve enough space for object heap
>
> =C2=A0We've got Xmx/Xms set to 4GB.
>
> =C2=A0top shows free memory around 50-80MB, file cache under
> =C2=A010MB, and the java process at 12+GB virt and 7.1GB res.
>
> =C2=A0This feels like a Java problem, not a Cassandra one, but I'm
> =C2=A0open to suggestions. =C2=A0To ensure I don't get bothered over
> =C2=A0the weekend we're doing a rolling restart of Cassandra on
> =C2=A0each of the boxes now. =C2=A0The last time they were restarted
> =C2=A0was just over a month ago. =C2=A0Now I'm wondering whether I
> =C2=A0should (until 0.7.1 is available) schedule in a slower rolling
> =C2=A0restart over several days, every few weeks.

Memory-mapped files will account for both virtual and, to the extent
that they are resident in memory, to the resident size of the process.
However, your graph:

> =C2=A0I've shared a Zabbix graph of system memory at:
>
> =C2=A0 =C2=A0 http://www.imagebam.com/image/3b4213110283969

Certainly indicates that it is not the explanation since you should be
seeing cached occupy the remainder of memory above heap size. In
addition the allocation failures from jmap indicates memory is truly
short.

Just to confirm, what does the free +/- buffers show if you run
'free'? (I.e., middle line, under 'free' column)

A Java memory leak would likely indicate non-heap managed memory
(since I think it's unlikely that the JVM fails to limit the actual
heap size). The question is what....

To cargo cult it: Are you running a modern JVM? (Not e.g. openjdk b17
in lenny or some such.) If it is a JVM issue, ensuring you're using a
reasonably recent JVM is probably much easier than to start tracking
it down...

--=20
/ Peter Schuller