Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F7C4109B5 for ; Tue, 5 Nov 2013 11:40:29 +0000 (UTC) Received: (qmail 19344 invoked by uid 500); 5 Nov 2013 11:39:45 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 19199 invoked by uid 500); 5 Nov 2013 11:39:30 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 19119 invoked by uid 99); 5 Nov 2013 11:39:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Nov 2013 11:39:26 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [74.125.82.53] (HELO mail-wg0-f53.google.com) (74.125.82.53) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Nov 2013 11:39:20 +0000 Received: by mail-wg0-f53.google.com with SMTP id y10so3395730wgg.32 for ; Tue, 05 Nov 2013 03:38:58 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=RVon2/Y9+zYHvxUMV2G0+I1OWfJ/LhyNyeIdJP95Ddg=; b=Gww+WQM797VQ/9YJAydSrLnlj0X0DWtJyz1W/3U+17fQgPq44GtTz3vMerbea2f2wk RNMgX71rNHn2CrXhXLlgPZ+AqXFJGZZnoqa1CUgesYZ3AN1vwCDScHvx0VbEJJtdTFq3 dEtJJMf9NFYvOA/XclgXapyFhU8L3vmFGZsRXTZhwxb7K4BVGysDHpicshteQA6/nDLZ vpWA5tST1gPlWhJnBrUhg6870jjpGOISI6Al0yeJzyct02ArSp2WZDIerpOUncULS2P0 P25bvebflJJrtJeNcTAevBkuJwTUyCrEPdo4VBkW5A8jp+xlnEo3AufJAvjTN+QWNQGh P1gg== X-Gm-Message-State: ALoCoQl8NEzCDgfol8bMBzEZHy0vcddLlWIgYt5KuRbBEfflRsM/AFSuPl4QMIWVQb9uAVJqgwz8 MIME-Version: 1.0 X-Received: by 10.180.12.45 with SMTP id v13mr16244998wib.57.1383651538532; Tue, 05 Nov 2013 03:38:58 -0800 (PST) Received: by 10.194.46.166 with HTTP; Tue, 5 Nov 2013 03:38:58 -0800 (PST) X-Originating-IP: [200.137.221.206] In-Reply-To: <5278B48C.6070200@avast.com> References: <5278B48C.6070200@avast.com> Date: Tue, 5 Nov 2013 09:38:58 -0200 Message-ID: Subject: Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled From: =?ISO-8859-1?Q?S=E1vio_Teles?= To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001a11c22ac072976804ea6c7aa4 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c22ac072976804ea6c7aa4 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable We have the same problem. 2013/11/5 Jiri Horky > Hi there, > > we are seeing extensive memory allocation leading to quite long and > frequent GC pauses when using row cache. This is on cassandra 2.0.0 > cluster with JNA 4.0 library with following settings: > > key_cache_size_in_mb: 300 > key_cache_save_period: 14400 > row_cache_size_in_mb: 1024 > row_cache_save_period: 14400 > commitlog_sync: periodic > commitlog_sync_period_in_ms: 10000 > commitlog_segment_size_in_mb: 32 > > -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=3D42 -Xms10G -Xmx10G > -Xmn1024M -XX:+HeapDumpOnOutOfMemoryError > > -XX:HeapDumpPath=3D/data2/cassandra-work/instance-1/cassandra-1383566283-= pid1893.hprof > -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=3D8 > -XX:MaxTenuringThreshold=3D1 -XX:CMSInitiatingOccupancyFraction=3D75 > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark > > We have disabled row cache on one node to see the difference. Please > see attached plots from visual VM, I think that the effect is quite > visible. I have also taken 10x "jmap -histo" after 5s on a affected > server and plotted the result, attached as well. > > I have taken a dump of the application when the heap size was 10GB, most > of the memory was unreachable, which was expected. The majority was used > by 55-59M objects of HeapByteBuffer, byte[] and > org.apache.cassandra.db.Column classes. I also include a list of inbound > references to the HeapByteBuffer objects from which it should be visible > where they are being allocated. This was acquired using Eclipse MAT. > > Here is the comparison of GC times when row cache enabled and disabled: > > prg01 - row cache enabled > - uptime 20h45m > - ConcurrentMarkSweep - 11494686ms > - ParNew - 14690885 ms > - time spent in GC: 35% > prg02 - row cache disabled > - uptime 23h45m > - ConcurrentMarkSweep - 251ms > - ParNew - 230791 ms > - time spent in GC: 0.27% > > I would be grateful for any hints. Please let me know if you need any > further information. For now, we are going to disable the row cache. > > Regards > Jiri Horky > --=20 Atenciosamente, S=E1vio S. Teles de Oliveira voice: +55 62 9136 6996 http://br.linkedin.com/in/savioteles Mestrando em Ci=EAncias da Computa=E7=E3o - UFG Arquiteto de Software Laboratory for Ubiquitous and Pervasive Applications (LUPA) - UFG --001a11c22ac072976804ea6c7aa4 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
We have the same problem.


2013/11/5 Jiri Horky <horky@avast.com= >
Hi there,

we are seeing extensive memory allocation leading to quite long and
frequent GC pauses when using row cache. This is on cassandra 2.0.0
cluster with JNA 4.0 library with following settings:

key_cache_size_in_mb: 300
key_cache_save_period: 14400
row_cache_size_in_mb: 1024
row_cache_save_period: 14400
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000
commitlog_segment_size_in_mb: 32

-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=3D42 -Xms10G -Xmx10G
-Xmn1024M -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=3D/data2/cassandra-work/instance-1/cassandra-1383566283-pi= d1893.hprof
-Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=3D8
-XX:MaxTenuringThreshold=3D1 -XX:CMSInitiatingOccupancyFraction=3D75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark

We have disabled row cache on one node to see =A0the =A0difference. Please<= br> see attached plots from visual VM, I think that the effect is quite
visible. I have also taken 10x "jmap -histo" after 5s on a affect= ed
server and plotted the result, attached as well.

I have taken a dump of the application when the heap size was 10GB, most of the memory was unreachable, which was expected. The majority was used by 55-59M objects of HeapByteBuffer, byte[] and
org.apache.cassandra.db.Column classes. I also include a list of inbound references to the HeapByteBuffer objects from which it should be visible where they are being allocated. This was acquired using Eclipse MAT.

Here is the comparison of GC times when row cache enabled and disabled:

prg01 - row cache enabled
=A0 =A0 =A0 - uptime 20h45m
=A0 =A0 =A0 - ConcurrentMarkSweep - 11494686ms
=A0 =A0 =A0 - ParNew - 14690885 ms
=A0 =A0 =A0 - time spent in GC: 35%
prg02 - row cache disabled
=A0 =A0 =A0 - uptime 23h45m
=A0 =A0 =A0 - ConcurrentMarkSweep - 251ms
=A0 =A0 =A0 - ParNew - 230791 ms
=A0 =A0 =A0 - time spent in GC: 0.27%

I would be grateful for any hints. Please let me know if you need any
further information. For now, we are going to disable the row cache.

Regards
Jiri Horky



--
Atenciosamente,
S=E1vio S. Teles de Oliveira
voic= e: +55 62 9136 6996
http://br.linkedin.com/in/savioteles
Mestrando em Ci=EAncias da Computa=E7=E3o - UFG
Arquiteto de Software
Laboratory for Ubiquitous and Pervasive Applications (LUP= A) - UFG
--001a11c22ac072976804ea6c7aa4--