Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 135 invoked from network); 10 Dec 2010 18:37:43 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 10 Dec 2010 18:37:43 -0000 Received: (qmail 16199 invoked by uid 500); 10 Dec 2010 18:37:41 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 16170 invoked by uid 500); 10 Dec 2010 18:37:41 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 16162 invoked by uid 99); 10 Dec 2010 18:37:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Dec 2010 18:37:41 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.214.177] (HELO mail-iw0-f177.google.com) (209.85.214.177) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Dec 2010 18:37:31 +0000 Received: by iwn38 with SMTP id 38so5857131iwn.36 for ; Fri, 10 Dec 2010 10:37:06 -0800 (PST) MIME-Version: 1.0 Received: by 10.231.19.136 with SMTP id a8mr798534ibb.73.1292006226714; Fri, 10 Dec 2010 10:37:06 -0800 (PST) Sender: scode@scode.org Received: by 10.231.207.15 with HTTP; Fri, 10 Dec 2010 10:37:06 -0800 (PST) X-Originating-IP: [213.114.156.79] In-Reply-To: References: Date: Fri, 10 Dec 2010 19:37:06 +0100 X-Google-Sender-Auth: j9GHG0Z_r72-YiQ4rgW34djVucU Message-ID: Subject: Re: Memory leak with Sun Java 1.6 ? From: Peter Schuller To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org > =C2=A0Over the past month or so, it looks like memory has slowly > =C2=A0been exhausted. =C2=A0Both nodetool drain and jmap can't run, and > =C2=A0produce this error: > > =C2=A0 =C2=A0 Error occurred during initialization of VM > =C2=A0 =C2=A0 Could not reserve enough space for object heap > > =C2=A0We've got Xmx/Xms set to 4GB. > > =C2=A0top shows free memory around 50-80MB, file cache under > =C2=A010MB, and the java process at 12+GB virt and 7.1GB res. > > =C2=A0This feels like a Java problem, not a Cassandra one, but I'm > =C2=A0open to suggestions. =C2=A0To ensure I don't get bothered over > =C2=A0the weekend we're doing a rolling restart of Cassandra on > =C2=A0each of the boxes now. =C2=A0The last time they were restarted > =C2=A0was just over a month ago. =C2=A0Now I'm wondering whether I > =C2=A0should (until 0.7.1 is available) schedule in a slower rolling > =C2=A0restart over several days, every few weeks. Memory-mapped files will account for both virtual and, to the extent that they are resident in memory, to the resident size of the process. However, your graph: > =C2=A0I've shared a Zabbix graph of system memory at: > > =C2=A0 =C2=A0 http://www.imagebam.com/image/3b4213110283969 Certainly indicates that it is not the explanation since you should be seeing cached occupy the remainder of memory above heap size. In addition the allocation failures from jmap indicates memory is truly short. Just to confirm, what does the free +/- buffers show if you run 'free'? (I.e., middle line, under 'free' column) A Java memory leak would likely indicate non-heap managed memory (since I think it's unlikely that the JVM fails to limit the actual heap size). The question is what.... To cargo cult it: Are you running a modern JVM? (Not e.g. openjdk b17 in lenny or some such.) If it is a JVM issue, ensuring you're using a reasonably recent JVM is probably much easier than to start tracking it down... --=20 / Peter Schuller