Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B79D310774 for ; Tue, 12 Nov 2013 04:29:45 +0000 (UTC) Received: (qmail 57070 invoked by uid 500); 12 Nov 2013 04:29:43 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 56214 invoked by uid 500); 12 Nov 2013 04:29:38 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 56205 invoked by uid 99); 12 Nov 2013 04:29:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Nov 2013 04:29:37 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy includes SPF record at spf.trusted-forwarder.org) Received: from [209.85.192.175] (HELO mail-pd0-f175.google.com) (209.85.192.175) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Nov 2013 04:29:30 +0000 Received: by mail-pd0-f175.google.com with SMTP id r10so918922pdi.34 for ; Mon, 11 Nov 2013 20:29:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:message-id:mime-version :subject:date:references:to:in-reply-to; bh=SbQGo99EeESdxmRegJaY6Go7MAnu2xiVHXdX9/hbFnI=; b=nE6kfnMz2u9xL/I3EY1Iu+lIYC9gJ50yjbIup4nuSOrxM35J1abUIbf2Da3t3lw4Ub RIgXjNNkbhMYWeRwTG0INLgV+YgSzn/Em14Gd1A+2QTWe5hPgr3bGkdWPGBBrXMF+/Mf borOUIvVLk38a/AQR5OGLpv5ly7p6ItL1x1xXLlBuXhVpfqlOkOEofIA8n7GD0GA/RKT egWr5QLnF9yrZK1AYfGCLmtmKytrO/zlWNi1gkde9sRl0Btu3a/hkwe5wsEAo386dgia gRV/N6JS1GKg/6A55mEmgs27/sZmgxn6KkM8gvyK4e8zzu6Ur3ZTiK6vXWgPzVmkm89i E2uQ== X-Gm-Message-State: ALoCoQkuavAy7xb4Yt2CfcsiNQ1ZeunkAU0s8Iqi2KuCHAXLwxRDrSIcpzXNZoes/2zdBpKDC7LH X-Received: by 10.66.248.227 with SMTP id yp3mr34589024pac.116.1384230549170; Mon, 11 Nov 2013 20:29:09 -0800 (PST) Received: from [172.16.1.20] ([203.86.207.101]) by mx.google.com with ESMTPSA id hw10sm34415337pbc.24.2013.11.11.20.29.07 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Mon, 11 Nov 2013 20:29:08 -0800 (PST) From: Aaron Morton Content-Type: multipart/alternative; boundary="Apple-Mail=_D7C75C69-560B-44FF-BCF3-E609E31E975C" Message-Id: <45E6CF38-8000-4AB9-9C24-EC1186CC9F0C@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 7.0 \(1822\)) Subject: Re: Cass 2.0.0: Extensive memory allocation when row_cache enabled Date: Tue, 12 Nov 2013 17:29:04 +1300 References: <5278B48C.6070200@avast.com> <527A7C8C.8060709@gmail.com> <81419EA3-C4B7-495B-9C18-17938BA3167E@thelastpickle.com> <527CC144.4050905@avast.com> To: Cassandra User In-Reply-To: <527CC144.4050905@avast.com> X-Mailer: Apple Mail (2.1822) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_D7C75C69-560B-44FF-BCF3-E609E31E975C Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 >> Are you doing large slices or do could you have a lot of tombstones = on the rows ?=20 > don't really know - how can I monitor that? For tombstones, do you do a lot of deletes ?=20 Also in v2.0.2 cfstats has this=20 Average live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 For large slices you need to check your code. e.g. do you anything that = reads lots of columns or very large columns or lets the user select how = many columns to read? The org.apache.cassandra.db.ArrayBackedSortedColumns in the trace back = is used during reads (.e.g. = org.apache.cassandra.db.filter.SliceQueryFilter) >> You probably want the heap to be 4G to 8G in size, 10G will encounter = longer pauses.=20 >> Also the size of the new heap may be too big depending on the number = of cores. I would recommend trying 800M > I tried to decrease it first to 384M then to 128M with no change in = the behaviour. I don't really care extra memory overhead of the cache - = to be able to actual point to it with objects, but I don't really see = the reason why it should create/delete those many objects so quickly.=20 Not sure what you changed to 384M.=20 >> Shows the heap growing very quickly. This could be due to wide reads = or a high write throughput.=20 > Well, both prg01 and prg02 receive the same load which is about = ~150-250 (during peak) read requests per seconds and 100-160 write = requests per second. The only with heap growing rapidly and GC kicking = in is on nodes with row cache enabled. This sounds like on a row cache miss cassandra is reading the whole row, = which happens to be a wide row. I would also guess some writes are going = to the rows and they are getting invalidated out of the row cache.=20 The row cache is not great for rows the update frequently and/or wide = rows.=20 How big are the rows ? use nodetool cfstats and nodetool cfhistograms.=20= Cheers ----------------- Aaron Morton New Zealand @aaronmorton Co-Founder & Principal Consultant Apache Cassandra Consulting http://www.thelastpickle.com On 8/11/2013, at 11:47 pm, Jiri Horky wrote: > Hi, >=20 > On 11/07/2013 05:18 AM, Aaron Morton wrote: >>> Class Name = | Shallow Heap | Retained Heap >>> = --------------------------------------------------------------------------= ----------------------------------------------------------------- >>> = | | =20 >>> java.nio.HeapByteBuffer @ 0x7806a0848 = | 48 | 80 >>> '- name org.apache.cassandra.db.Column @ 0x7806424e8 = | 32 | 112 >>> |- [338530] java.lang.Object[540217] @ 0x57d62f560 Unreachable = | 2,160,888 | 2,160,888 >>> |- [338530] java.lang.Object[810325] @ 0x591546540 = | 3,241,320 | 7,820,328 >>> | '- elementData java.util.ArrayList @ 0x75e8424c0 = | 24 | 7,820,352 >>> | |- list = org.apache.cassandra.db.ArrayBackedSortedColumns$SlicesIterator @ = 0x5940e0b18 | 48 | 128 >>> | | '- val$filteredIter = org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48 = | 32 | 7,820,568 >>> | | '- val$iter = org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68 Unreachable = | 24 | 7,820,592 >>> | |- this$0, parent java.util.ArrayList$SubList @ 0x5940e0bb8 = | 40 | 40 >>> | | '- this$1 java.util.ArrayList$SubList$1 @ 0x5940e0be0 = | 40 | 80 >>> | | '- currentSlice = org.apache.cassandra.db.ArrayBackedSortedColumns$SlicesIterator @ = 0x5940e0b18| 48 | 128 >>> | | '- val$filteredIter = org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48 | = 32 | 7,820,568 >>> | | '- val$iter = org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68 Unreachable = | 24 | 7,820,592 >>> | |- columns org.apache.cassandra.db.ArrayBackedSortedColumns = @ 0x5b0a33488 | 32 | 56 >>> | | '- val$cf = org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48 = | 32 | 7,820,568 >>> | | '- val$iter = org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68 Unreachable = | 24 | 7,820,592 >>> | '- Total: 3 entries = | | =20 >>> |- [338530] java.lang.Object[360145] @ 0x7736ce2f0 Unreachable = | 1,440,600 | 1,440,600 >>> '- Total: 3 entries = | | =20 >>=20 >> Are you doing large slices or do could you have a lot of tombstones = on the rows ?=20 > don't really know - how can I monitor that? >>=20 >>> We have disabled row cache on one node to see the difference. = Please >>> see attached plots from visual VM, I think that the effect is quite >>> visible. >> The default row cache is of the JVM heap, have you changed to the = ConcurrentLinkedHashCacheProvider ?=20 > Answered by Chris already :) No. >>=20 >> One way the SerializingCacheProvider could impact GC is if the CF = takes a lot of writes. The SerializingCacheProvider invalidates the row = when it is written to and had to read the entire row and serialise it on = a cache miss. >>=20 >>>> -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=3D42 -Xms10G = -Xmx10G >>>> -Xmn1024M -XX:+HeapDumpOnOutOfMemoryError >> You probably want the heap to be 4G to 8G in size, 10G will encounter = longer pauses.=20 >> Also the size of the new heap may be too big depending on the number = of cores. I would recommend trying 800M > I tried to decrease it first to 384M then to 128M with no change in = the behaviour. I don't really care extra memory overhead of the cache - = to be able to actual point to it with objects, but I don't really see = the reason why it should create/delete those many objects so quickly.=20 >>=20 >>=20 >>> prg01.visual.vm.png >> Shows the heap growing very quickly. This could be due to wide reads = or a high write throughput.=20 > Well, both prg01 and prg02 receive the same load which is about = ~150-250 (during peak) read requests per seconds and 100-160 write = requests per second. The only with heap growing rapidly and GC kicking = in is on nodes with row cache enabled. >=20 >>=20 >> Hope that helps.=20 > Thank you! >=20 > Jiri Horky >=20 --Apple-Mail=_D7C75C69-560B-44FF-BCF3-E609E31E975C Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1
Are you doing large slices or do could you have a lot of = tombstones on the rows ? 
don't really know - how = can I monitor that?
For tombstones, do you do a lot of = deletes ? 
Also in v2.0.2 cfstats has = this 

Average live cells per = slice (last five minutes): 0.0
Average tombstones per = slice (last five minutes): 0.0

For large = slices you need to check your code. e.g. do you anything that reads lots = of columns or very large columns or lets the user select how many = columns to read?

The = org.apache.cassandra.db.ArrayBackedSortedColumns in the trace back is = used during reads (.e.g. = org.apache.cassandra.db.filter.SliceQueryFilter)

=
You probably want the heap to be 4G to 8G in size, = 10G will encounter longer pauses. 
Also the size of the = new heap may be too big depending on the number of cores. I would = recommend trying 800M
I tried to decrease it first to = 384M then to 128M with no change in the behaviour. I don't really care = extra memory overhead of the cache - to be able to actual point to it = with objects, but I don't really see the reason why it should = create/delete those many objects so quickly. 
Not = sure what you changed to = 384M. 

Shows the = heap growing very quickly. This could be due to wide reads or a high = write throughput. 
Well, both prg01 = and prg02 receive the same load which is about ~150-250 (during peak) = read requests per seconds and 100-160 write requests per second. The = only with heap growing rapidly and GC kicking in is on nodes with row = cache enabled.

This sounds = like on a row cache miss cassandra is reading the whole row, which = happens to be a wide row. I would also guess some writes are going to = the rows and they are getting invalidated out of the row = cache. 

The row cache is not great for = rows the update frequently and/or wide = rows. 

How big are the rows ? use nodetool = cfstats and nodetool = cfhistograms. 

Cheers


http://www.thelastpickle.com

On 8/11/2013, at 11:47 pm, Jiri Horky <horky@avast.com> wrote:

=20 =20
Hi,

On 11/07/2013 05:18 AM, Aaron Morton wrote:
Class Name               =                     =                                 =                     =            | Shallow Heap | Retained Heap
=
---------------------------------------------------------------------= ----------------------------------------------------------------------
                =                     =                     =                       =                     =           |           =    |              
java.nio.HeapByteBuffer @ 0x7806a0848       =                                   =                     =           |           = 48 |            80
'- name org.apache.cassandra.db.Column @ 0x7806424e8   =                       =                     =            |         =   32 |           112
   |- [338530] java.lang.Object[540217] @ = 0x57d62f560 Unreachable               =                     =         |    2,160,888 |     2,160,888
   |- [338530] java.lang.Object[810325] @ = 0x591546540                         =                     =           |    3,241,320 |     7,820,328
   |  '- elementData java.util.ArrayList @ = 0x75e8424c0                       =                     =            |         =   24 |     7,820,352
   |     |- list = org.apache.cassandra.db.ArrayBackedSortedColumns$SlicesIterator @ 0x5940e0b18             =  |           48 |       =     128
   |     |  '- val$filteredIter org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48             |   =         32 |     7,820,568
   |     |     '- val$iter org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68 Unreachable           |     =       24 |     7,820,592
   |     |- this$0, parent = java.util.ArrayList$SubList @ 0x5940e0bb8               =                     =      |           40 |            40
   |     |  '- this$1 = java.util.ArrayList$SubList$1 @ 0x5940e0be0               =                     =         |           40 |           =  80
   |     |     '- = currentSlice = org.apache.cassandra.db.ArrayBackedSortedColumns$SlicesIterator @ 0x5940e0b18|           48 |   =         128
   |     |       =  '- val$filteredIter org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48       |         =   32 |     7,820,568
   |     |         =   '- val$iter org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68 Unreachable     |           = 24 |     7,820,592
   |     |- columns org.apache.cassandra.db.ArrayBackedSortedColumns @ 0x5b0a33488                   =        |           32 | =            56
   |     |  '- val$cf org.apache.cassandra.db.filter.SliceQueryFilter$1 @ 0x5940e0b48               =         |           32 | =     7,820,568
   |     |     '- val$iter org.apache.cassandra.db.filter.QueryFilter$2 @ 0x5940e0b68 Unreachable           |     =       24 |     7,820,592
   |     '- Total: 3 entries   =                     =                           =                     =            |         =      |              
   |- [338530] java.lang.Object[360145] @ = 0x7736ce2f0 Unreachable               =                     =         |    1,440,600 |     1,440,600
   '- Total: 3 entries       =                     =                             =                     =            |         =      |              

Are you doing large slices or do could you have a lot of tombstones on the rows ?
don't really know - how can I monitor that?

We have disabled row cache on one node to see  the  difference. Please
see attached plots from visual VM, I think that the effect is quite
visible.
The default row cache is of the JVM heap, have you changed to the ConcurrentLinkedHashCacheProvider ?
Answered by Chris already :) No.

One way the SerializingCacheProvider could impact GC is = if the CF takes a lot of writes. The SerializingCacheProvider invalidates the row when it is written to and had to read the entire row and serialise it on a cache miss.

-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=3D42 -Xms10G -Xmx10G
-Xmn1024M -XX:+HeapDumpOnOutOfMemoryError
You probably want the heap to be 4G to 8G in size, 10G will encounter longer pauses. 
Also the size of the new heap may be too big depending on the number of cores. I would recommend trying 800M
I tried to decrease it first to 384M then to 128M with no change in the behaviour. I don't really care extra memory overhead of the cache - to be able to actual point to it with objects, but I don't really see the reason why it should create/delete those many objects so quickly.


prg01.visual.vm.png
Shows the heap growing very quickly. This could be due to wide reads or a high write throughput.
Well, both prg01 and prg02 receive the same load which is about ~150-250 (during peak) read requests per seconds and 100-160 write requests per second. The only with heap growing rapidly and GC kicking in is on nodes with row cache enabled.


Hope that helps.
Thank you!

Jiri Horky


= --Apple-Mail=_D7C75C69-560B-44FF-BCF3-E609E31E975C--