Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 1707 invoked from network); 18 Dec 2010 03:15:31 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 Dec 2010 03:15:31 -0000 Received: (qmail 88613 invoked by uid 500); 18 Dec 2010 03:15:29 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 88538 invoked by uid 500); 18 Dec 2010 03:15:28 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 88520 invoked by uid 99); 18 Dec 2010 03:15:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Dec 2010 03:15:28 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of schumi.han@gmail.com designates 209.85.161.43 as permitted sender) Received: from [209.85.161.43] (HELO mail-fx0-f43.google.com) (209.85.161.43) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Dec 2010 03:15:21 +0000 Received: by fxm18 with SMTP id 18so1414692fxm.30 for ; Fri, 17 Dec 2010 19:15:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:cc:content-type; bh=QwMo5kXcrFfr7M1neGpsdMQ3kxctyS950coyXS90KQc=; b=jVrKghOb5HTf/1OubsTFx6R8At8rOr8HWtXnp3NLsPsuDTmCGlLBPfV4/QPgMxSUFG fZ/qBXbdsKxYL2odIT1otoeHPkFLvpX4RRY2rcade8zoi17lA8P0UlFZc5JabnO+I3Cf QzIiAMCqUAJ+L16+x0xh0030dTv9tEYdxkigs= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; b=nTKF9SQsTKjnk3jHNgTPE4XT8e0sDxLndHtKAVMThoBX4YtHsoN5RmcCAeL6CG0GfH aWATfHK7udSP/0slaFXXLgaHzgwYqqdE4wlAiJIiG9i9BDI2RseLkWtrf5nswJGMpzb0 jzc6zNHCIx6NwsrH962Q1k4HtY++obIxNkWIU= MIME-Version: 1.0 Received: by 10.223.97.8 with SMTP id j8mr1846366fan.141.1292642100954; Fri, 17 Dec 2010 19:15:00 -0800 (PST) Received: by 10.223.96.130 with HTTP; Fri, 17 Dec 2010 19:15:00 -0800 (PST) In-Reply-To: References: <3CB61C05-8FEE-4C88-B179-EBBD488E77A1@backupify.com> Date: Sat, 18 Dec 2010 11:15:00 +0800 Message-ID: Subject: Re: [SOLVED] Very high memory utilization (not caused by mmap on sstables) From: Zhu Han To: user@cassandra.apache.org Cc: dev@cassandra.apache.org Content-Type: multipart/alternative; boundary=20cf30433ede3fc5000497a6b1f1 X-Virus-Checked: Checked by ClamAV on apache.org --20cf30433ede3fc5000497a6b1f1 Content-Type: text/plain; charset=ISO-8859-1 Seems like the problem there after I upgrade to "OpenJDK Runtime Environment (IcedTea6 1.9.2)". So it is not related to the bug I reported two days ago. Can somebody else share some info with us? What's the java environment you used? Is it stable for long-lived cassandra instances? best regards, hanzhu On Thu, Dec 16, 2010 at 9:28 PM, Zhu Han wrote: > I've tried it. But it does not work for me this afternoon. > > Thank you! > > best regards, > hanzhu > > > > On Thu, Dec 16, 2010 at 8:59 PM, Matthew Conway wrote: > >> Thanks for debugging this, I'm running into the same problem. >> BTW, if you can ssh into your nodes, you can use jconsole over ssh: >> http://simplygenius.com/2010/08/jconsole-via-socks-ssh-tunnel.html >> >> Matt >> >> >> On Dec 16, 2010, at Thu Dec 16, 2:39 AM, Zhu Han wrote: >> >> > Sorry for spam again. :-) >> > >> > I think I find the root cause. Here is a bug report[1] on memory leak of >> > ParNewGC. It is solved by OpenJDK 1.6.0_20(IcedTea6 1.9.2)[2]. >> > >> > So the suggestion is: for who runs cassandra of Ubuntu 10.04, please >> > upgrade OpenJDK to the latest version. >> > >> > [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6824570 >> > [2] http://blog.fuseyism.com/index.php/2010/09/10/icedtea6-19-released/ >> > >> > best regards, >> > hanzhu >> > >> > >> > On Thu, Dec 16, 2010 at 3:10 PM, Zhu Han wrote: >> > >> >> The test node is behind a firewall. So I took some time to find a way >> to >> >> get JMX diagnostic information from it. >> >> >> >> What's interesting is, both the HeapMemoryUsage and NonHeapMemoryUsage >> >> reported by JVM is quite reasonable. So, it's a myth why the JVM >> process >> >> maps such a big anonymous memory region... >> >> >> >> $ java -Xmx128m -jar /tmp/cmdline-jmxclient-0.10.3.jar - localhost:8080 >> >> java.lang:type=Memory HeapMemoryUsage >> >> 12/16/2010 15:07:45 +0800 org.archive.jmx.Client HeapMemoryUsage: >> >> committed: 1065025536 >> >> init: 1073741824 >> >> max: 1065025536 >> >> used: 18295328 >> >> >> >> $java -Xmx128m -jar /tmp/cmdline-jmxclient-0.10.3.jar - localhost:8080 >> >> java.lang:type=Memory NonHeapMemoryUsage >> >> 12/16/2010 15:01:51 +0800 org.archive.jmx.Client NonHeapMemoryUsage: >> >> committed: 34308096 >> >> init: 24313856 >> >> max: 226492416 >> >> used: 21475376 >> >> >> >> If anybody is interested in it, I can provide more diagnostic >> information >> >> before I restart the instance. >> >> >> >> best regards, >> >> hanzhu >> >> >> >> >> >> >> >> On Thu, Dec 16, 2010 at 1:00 PM, Zhu Han wrote: >> >> >> >>> After investigating it deeper, I suspect it's native memory leak of >> JVM. >> >>> The large anonymous map on lower address space should be the native >> heap of >> >>> JVM, but not java object heap. Has anybody met it before? >> >>> >> >>> I'll try to upgrade the JVM tonight. >> >>> >> >>> best regards, >> >>> hanzhu >> >>> >> >>> >> >>> >> >>> On Thu, Dec 16, 2010 at 10:50 AM, Zhu Han >> wrote: >> >>> >> >>>> Hi, >> >>>> >> >>>> I have a test node with apache-cassandra-0.6.8 on ubuntu 10.4. The >> >>>> hardware environment is an OpenVZ container. JVM settings is >> >>>> # java -Xmx128m -version >> >>>> java version "1.6.0_18" >> >>>> OpenJDK Runtime Environment (IcedTea6 1.8.2) (6b18-1.8.2-4ubuntu2) >> >>>> OpenJDK 64-Bit Server VM (build 16.0-b13, mixed mode) >> >>>> >> >>>> This is the memory settings: >> >>>> >> >>>> "/usr/bin/java -ea -Xms1G -Xmx1G ..." >> >>>> >> >>>> And the ondisk footprint of sstables is very small: >> >>>> >> >>>> "#du -sh data/ >> >>>> "9.8M data/" >> >>>> >> >>>> The node was infrequently accessed in the last three weeks. After >> that, >> >>>> I observe the abnormal memory utilization by top: >> >>>> >> >>>> PID USER PR NI *VIRT* *RES* SHR S %CPU %MEM TIME+ >> >>>> COMMAND >> >>>> >> >>>> 7836 root 15 0 *3300m* *2.4g* 13m S 0 26.0 2:58.51 >> >>>> java >> >>>> >> >>>> The jvm heap utilization is quite normal: >> >>>> >> >>>> #sudo jstat -gc -J"-Xmx128m" 7836 >> >>>> S0C S1C S0U S1U *EC* *EU* *OC* >> >>>> *OU* *PC PU* YGC YGCT FGC FGCT >> >>>> GCT >> >>>> 8512.0 8512.0 372.8 0.0 *68160.0* *5225.7* *963392.0 >> 508200.7 >> >>>> 30604.0 18373.4* 480 3.979 2 0.005 3.984 >> >>>> >> >>>> And then I try "pmap" to see the native memory mapping. *There is two >> >>>> large anonymous mmap regions.* >> >>>> >> >>>> 00000000080dc000 1573568K rw--- [ anon ] >> >>>> 00002b2afc900000 1079180K rw--- [ anon ] >> >>>> >> >>>> The second one should be JVM heap. What is the first one? Mmap of >> >>>> sstable should never be anonymous mmap, but file based mmap. *Is it >> a >> >>>> native memory leak? *Does cassandra allocate any DirectByteBuffer? >> >>>> >> >>>> best regards, >> >>>> hanzhu >> >>>> >> >>> >> >>> >> >> >> >> > --20cf30433ede3fc5000497a6b1f1 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Seems like=A0 the problem there after I upgrade to "OpenJDK Runtime En= vironment (IcedTea6 1.9.2)". So it is not related to the bug I reporte= d two days ago.

Can somebody else share some info with us? What'= s the java environment you used? Is it stable for long-lived cassandra inst= ances?

best regards,
hanzhu


On Thu, Dec 16, 2010 at 9:28 PM, Zhu Han= <schumi.han@g= mail.com> wrote:
I've tried it. But it does not work for me this afternoon.

Thank= you!

best regards,
hanzhu



On Thu, Dec 16, 2010 at 8:59 PM, Matthew= Conway <matt@backupify.com> wrote:
Thanks for debugging this, I'm running into the same problem.
BTW, if you can ssh into your nodes, you can use jconsole over ssh: http://simplygenius.com/2010/08/jconsole-via-socks-ssh-tunne= l.html

Matt


On Dec 16, 2010, at Thu Dec 16, 2:39 AM, Zhu Han wrote:

> Sorry for spam again. :-)
>
> I think I find the root cause. Here is a bug report[1] on memory leak = of
> ParNewGC. =A0It is solved by OpenJDK 1.6.0_20(IcedTea6 1.9.2)[2].
>
> So the suggestion is: for who runs cassandra =A0of Ubuntu 10.04, pleas= e
> upgrade OpenJDK to the latest version.
>
> [1] http://bugs.sun.com/bugdatabase/view_bug.do?bug_id= =3D6824570
> [2] http://blog.fuseyism.com/index.php/2010/09/= 10/icedtea6-19-released/
>
> best regards,
> hanzhu
>
>
> On Thu, Dec 16, 2010 at 3:10 PM, Zhu Han <schumi.han@gmail.com> wrote:
>
>> The test node is behind a firewall. So I took some time to find a = way to
>> get JMX diagnostic information from it.
>>
>> What's interesting is, both the HeapMemoryUsage and NonHeapMem= oryUsage
>> reported by JVM is quite reasonable. =A0So, it's a myth why th= e JVM process
>> maps such a big anonymous memory region...
>>
>> $ java -Xmx128m -jar /tmp/cmdline-jmxclient-0.10.3.jar - localhost= :8080
>> java.lang:type=3DMemory HeapMemoryUsage
>> 12/16/2010 15:07:45 +0800 org.archive.jmx.Client HeapMemoryUsage:<= br> >> committed: 1065025536
>> init: 1073741824
>> max: 1065025536
>> used: 18295328
>>
>> $java -Xmx128m -jar /tmp/cmdline-jmxclient-0.10.3.jar - localhost:= 8080
>> java.lang:type=3DMemory NonHeapMemoryUsage
>> 12/16/2010 15:01:51 +0800 org.archive.jmx.Client NonHeapMemoryUsag= e:
>> committed: 34308096
>> init: 24313856
>> max: 226492416
>> used: 21475376
>>
>> If anybody is interested in it, I can provide more diagnostic info= rmation
>> before I restart the instance.
>>
>> best regards,
>> hanzhu
>>
>>
>>
>> On Thu, Dec 16, 2010 at 1:00 PM, Zhu Han <schumi.han@gmail.com> wrote: >>
>>> After investigating it deeper, =A0I suspect it's native me= mory leak of JVM.
>>> The large anonymous map on lower address space should be the n= ative heap of
>>> JVM, =A0but not java object heap. =A0Has anybody met it before= ?
>>>
>>> I'll try to upgrade the JVM tonight.
>>>
>>> best regards,
>>> hanzhu
>>>
>>>
>>>
>>> On Thu, Dec 16, 2010 at 10:50 AM, Zhu Han <schumi.han@gmail.com> wrot= e:
>>>
>>>> Hi,
>>>>
>>>> I have a test node with apache-cassandra-0.6.8 on ubuntu 1= 0.4. =A0The
>>>> hardware environment is an OpenVZ container. JVM settings = is
>>>> # java -Xmx128m -version
>>>> java version "1.6.0_18"
>>>> OpenJDK Runtime Environment (IcedTea6 1.8.2) (6b18-1.8.2-4= ubuntu2)
>>>> OpenJDK 64-Bit Server VM (build 16.0-b13, mixed mode)
>>>>
>>>> This is the memory settings:
>>>>
>>>> "/usr/bin/java -ea -Xms1G -Xmx1G ..."
>>>>
>>>> And the ondisk footprint of sstables is very small:
>>>>
>>>> "#du -sh data/
>>>> "9.8M =A0 =A0data/"
>>>>
>>>> The node was infrequently accessed in the last =A0three we= eks. =A0After that,
>>>> I observe the abnormal memory utilization by top:
>>>>
>>>> =A0PID USER =A0 =A0 =A0PR =A0NI =A0*VIRT* =A0*RES* =A0SHR = S %CPU %MEM =A0 =A0TIME+
>>>> COMMAND
>>>>
>>>> 7836 root =A0 =A0 =A015 =A0 0 =A0 =A0 *3300m* *2.4g* =A013= m S =A0 =A00 26.0 =A0 2:58.51
>>>> java
>>>>
>>>> The jvm heap utilization is quite normal:
>>>>
>>>> #sudo jstat -gc -J"-Xmx128m" 7836
>>>> S0C =A0 =A0S1C =A0 =A0S0U =A0 =A0S1U =A0 =A0 =A0*EC* =A0 = =A0 =A0 *EU* =A0 =A0 =A0 =A0 =A0*OC*
>>>> *OU* =A0 =A0 =A0 =A0 =A0 =A0*PC =A0 =A0 =A0 =A0 =A0 PU* = =A0 =A0 =A0 =A0 =A0YGC =A0YGCT =A0FGC =A0 =A0FGCT
>>>> GCT
>>>> 8512.0 8512.0 372.8 =A0 0.0 =A0 *68160.0* =A0 *5225.7* =A0= *963392.0 =A0 508200.7
>>>> 30604.0 18373.4* =A0 =A0480 =A0 =A03.979 =A0 =A0 =A02 =A0 = =A0 =A00.005 =A0 =A03.984
>>>>
>>>> And then I try "pmap" to see the native memory m= apping. *There is two
>>>> large anonymous mmap regions.*
>>>>
>>>> 00000000080dc000 1573568K rw--- =A0 =A0[ anon ]
>>>> 00002b2afc900000 =A01079180K rw--- =A0 =A0[ anon ]
>>>>
>>>> The second one should be JVM heap. =A0What is the first on= e? =A0Mmap of
>>>> sstable should never be anonymous mmap, but file based mma= p. =A0*Is it =A0a
>>>> native memory leak? =A0*Does cassandra allocate any Direct= ByteBuffer?
>>>>
>>>> best regards,
>>>> hanzhu
>>>>
>>>
>>>
>>



--20cf30433ede3fc5000497a6b1f1--