Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C172310FAE for ; Wed, 6 Nov 2013 17:29:28 +0000 (UTC) Received: (qmail 53824 invoked by uid 500); 6 Nov 2013 17:29:26 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 53172 invoked by uid 500); 6 Nov 2013 17:29:23 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 53132 invoked by uid 99); 6 Nov 2013 17:29:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Nov 2013 17:29:22 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of chris.burroughs@gmail.com designates 209.85.216.182 as permitted sender) Received: from [209.85.216.182] (HELO mail-qc0-f182.google.com) (209.85.216.182) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Nov 2013 17:29:14 +0000 Received: by mail-qc0-f182.google.com with SMTP id n7so6247969qcx.27 for ; Wed, 06 Nov 2013 09:28:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=5gYk5jTuC4igwo1jbGG7iJkJLFhOUnjfZWPbvJO/V34=; b=sYGQpCSt0/oS+VZ9fqZ/u7VkPmvzPJacEiFSShz9RKSh8WToRDrXhcAMTys+86Gsyl CQoBcOoRF1kkl/Un7S7nDWDB+FlcWmLaa++SlHA/I5mgFA3cN3DjhnOHsrqgRVCadlBL xtKAV71PSyzeR2Bq/af/FrtWnMzgSQIneGvMr/qt4KZszP4I++tzZ9vPze6nR5zRR/Or Kj+H8YmJOcPyqCeazuwBJHZas+UralibHO7clxmk4Tr1wYc8PGyZNDaH0XxrWJUu47BM DfP5Fpr5TjU8P/xIHOD/6APAeflzBdJAllSsNlkSITRAKY8N9Nz+CE0IqfDjtZlU031R dlcw== X-Received: by 10.224.36.146 with SMTP id t18mr7204735qad.111.1383758933602; Wed, 06 Nov 2013 09:28:53 -0800 (PST) Received: from [192.168.1.142] (208-58-66-240.c3-0.161-ubr1.lnh-161.md.cable.rcn.com. [208.58.66.240]) by mx.google.com with ESMTPSA id 4sm81512101qak.11.2013.11.06.09.28.52 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Wed, 06 Nov 2013 09:28:53 -0800 (PST) Message-ID: <527A7C8C.8060709@gmail.com> Date: Wed, 06 Nov 2013 12:29:48 -0500 From: Chris Burroughs User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20131013 Icedove/17.0.9 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled References: <5278B48C.6070200@avast.com> In-Reply-To: <5278B48C.6070200@avast.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Both caches involve several objects per entry (What do we want? Packed objects. When do we want them? Now!). The "size" is an estimate of the off heap values only and not the total size nor number of entries. An acceptable size will depend on your data and access patterns. In one case we had a cluster that at 512mb would go into a GC death spiral despite plenty of free heap (presumably just due to the number of objects) while empirically the cluster runs smoothly at 384mb. Your caches appear on the larger size, I suggest trying smaller values and only increase when it produces measurable sustained gains. On 11/05/2013 04:04 AM, Jiri Horky wrote: > Hi there, > > we are seeing extensive memory allocation leading to quite long and > frequent GC pauses when using row cache. This is on cassandra 2.0.0 > cluster with JNA 4.0 library with following settings: > > key_cache_size_in_mb: 300 > key_cache_save_period: 14400 > row_cache_size_in_mb: 1024 > row_cache_save_period: 14400 > commitlog_sync: periodic > commitlog_sync_period_in_ms: 10000 > commitlog_segment_size_in_mb: 32 > > -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10G -Xmx10G > -Xmn1024M -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/data2/cassandra-work/instance-1/cassandra-1383566283-pid1893.hprof > -Xss180k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 > -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 > -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB -XX:+UseCondCardMark > > We have disabled row cache on one node to see the difference. Please > see attached plots from visual VM, I think that the effect is quite > visible. I have also taken 10x "jmap -histo" after 5s on a affected > server and plotted the result, attached as well. > > I have taken a dump of the application when the heap size was 10GB, most > of the memory was unreachable, which was expected. The majority was used > by 55-59M objects of HeapByteBuffer, byte[] and > org.apache.cassandra.db.Column classes. I also include a list of inbound > references to the HeapByteBuffer objects from which it should be visible > where they are being allocated. This was acquired using Eclipse MAT. > > Here is the comparison of GC times when row cache enabled and disabled: > > prg01 - row cache enabled > - uptime 20h45m > - ConcurrentMarkSweep - 11494686ms > - ParNew - 14690885 ms > - time spent in GC: 35% > prg02 - row cache disabled > - uptime 23h45m > - ConcurrentMarkSweep - 251ms > - ParNew - 230791 ms > - time spent in GC: 0.27% > > I would be grateful for any hints. Please let me know if you need any > further information. For now, we are going to disable the row cache. > > Regards > Jiri Horky >