Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 77638C40F for ; Wed, 6 Jun 2012 09:06:04 +0000 (UTC) Received: (qmail 80457 invoked by uid 500); 6 Jun 2012 09:06:02 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 80283 invoked by uid 500); 6 Jun 2012 09:06:01 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 80261 invoked by uid 99); 6 Jun 2012 09:06:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jun 2012 09:06:01 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a46.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Jun 2012 09:05:53 +0000 Received: from homiemail-a46.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a46.g.dreamhost.com (Postfix) with ESMTP id 44F303E4057 for ; Wed, 6 Jun 2012 02:05:31 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; q=dns; s=thelastpickle.com; b=ggpbwvuqE9 b0ZYmuAygHzWJX1LEgvGpFB2SFhlhUe0L9WFoPcozx9TysnhKPzHxx4ye/OIkTt7 willD18vcT3aOyudvLWg4FXnyq2Xo62r8sQuoeIfvrg9bxGUAl7N7T0HoFn0mKCL QqpTExdSm8NEhSsJt+AREjmA9+mSRRwtA= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :mime-version:content-type:subject:date:in-reply-to:to :references:message-id; s=thelastpickle.com; bh=yxws49ue1hGj0GSf UxzP/EBlfAI=; b=s2dhSgrvFXrFRl6zNZirss8KC//qpdgNb576TisOfIyhbFVC i2cPW8h70lQeMRtdZqc9+m4tRRfNeUyMfgNgq18CV+WM6OmtNJT6nHkJhF3Sqq2B K0hpo0KmbTtP1+V34RGo5pCqikGleEL6g/0jIWly7JaS2DKIZc1c/nnSobQ= Received: from [172.16.1.4] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a46.g.dreamhost.com (Postfix) with ESMTPSA id 4B7FD3E405C for ; Wed, 6 Jun 2012 02:05:30 -0700 (PDT) From: aaron morton Mime-Version: 1.0 (Apple Message framework v1278) Content-Type: multipart/alternative; boundary="Apple-Mail=_217423A3-8AF0-48D9-A80E-94F19D4D0B54" Subject: Re: memory issue on 1.1.0 Date: Wed, 6 Jun 2012 21:05:27 +1200 In-Reply-To: To: user@cassandra.apache.org References: <6FEB097F-287B-471D-BEA2-48862B30F382@bloomdigital.com> Message-Id: X-Mailer: Apple Mail (2.1278) --Apple-Mail=_217423A3-8AF0-48D9-A80E-94F19D4D0B54 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii I looked through the log again. Still looks like it's overloaded and not = handling the overload very well.=20 It looks like a sustained write load of around 280K columns every 5 = minutes for about 5 hours. It may be that the CPU is the bottle neck = when it comes to GC throughput. You are hitting ParNew issues from the = very start, and end up with 20 second CMS. Do you see high CPU load ?=20 Can you enable the GC logging options in cassandra-env.sh ?=20 =20 Can you throttle back the test and to a level where the server does not = fail ?=20 Alternatively can you dump the heap when it get's full and see what it = taking up all the space ? Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 6/06/2012, at 2:12 PM, Poziombka, Wade L wrote: > Ok, so I have completely refactored to remove deletes and it still = fails. So it is completely unrelated to deletes. >=20 > I guess I need to go back to 1.0.10? When I originally evaluated I = ran 1.0.8... perhaps I went a bridge too far with 1.1. >=20 > I don't think I am doing anything exotic here. >=20 > Here is my column family. =20 >=20 > KsDef(name:TB_UNIT, = strategy_class:org.apache.cassandra.locator.SimpleStrategy, = strategy_options:{replication_factor=3D3},=20 > cf_defs:[ >=20 > CfDef(keyspace:TB_UNIT, name:token, column_type:Standard, = comparator_type:BytesType, column_metadata:[ColumnDef(name:70 61 6E 45 = 6E 63, validation_class:BytesType), ColumnDef(name:63 72 65 61 74 65 54 = 73, validation_class:DateType), ColumnDef(name:63 72 65 61 74 65 44 61 = 74 65, validation_class:DateType, index_type:KEYS, = index_name:TokenCreateDate), ColumnDef(name:65 6E 63 72 79 70 74 69 6F = 6E 53 65 74 74 69 6E 67 73 49 44, validation_class:UTF8Type, = index_type:KEYS, index_name:EncryptionSettingsID)], caching:keys_only),=20= >=20 > CfDef(keyspace:TB_UNIT, name:pan_d721fd40fd9443aa81cc6f59c8e047c6, = column_type:Standard, comparator_type:BytesType, caching:keys_only),=20 >=20 > CfDef(keyspace:TB_UNIT, name:counters, column_type:Standard, = comparator_type:BytesType, column_metadata:[ColumnDef(name:75 73 65 43 = 6F 75 6E 74, validation_class:CounterColumnType)], = default_validation_class:CounterColumnType, caching:keys_only) >=20 > ]) >=20 >=20 > -----Original Message----- > From: Poziombka, Wade L [mailto:wade.l.poziombka@intel.com]=20 > Sent: Tuesday, June 05, 2012 3:09 PM > To: user@cassandra.apache.org > Subject: RE: memory issue on 1.1.0 >=20 > Thank you. I do have some of the same observations. Do you do = deletes? >=20 > My observation is that without deletes (or column updates I guess) I = can run forever happy. but when I run (what for me is a batch process) = operations that delete and modify column values I run into this. >=20 > Reading bug https://issues.apache.org/jira/browse/CASSANDRA-3741 the = advice is to NOT do deletes individually and to truncate. I am = scrambling to try to do this but curious if it will be worth the effort. >=20 > Wade >=20 > -----Original Message----- > From: Mina Naguib [mailto:mina.naguib@bloomdigital.com]=20 > Sent: Tuesday, June 05, 2012 2:24 PM > To: user@cassandra.apache.org > Subject: Re: memory issue on 1.1.0 >=20 >=20 > Hi Wade >=20 > I don't know if your scenario matches mine, but I've been struggling = with memory pressure in 1.x as well. I made the jump from 0.7.9 to = 1.1.0, along with enabling compression and levelled compactions, so I = don't know which specifically is the main culprit. >=20 > Specifically, all my nodes seem to "lose" heap memory. As parnew and = CMS do their job, over any reasonable period of time, the "floor" of = memory after a GC keeps rising. This is quite visible if you leave = jconsole connected for a day or so, and manifests itself as a = funny-looking cone like so: = http://mina.naguib.ca/images/cassandra_jconsole.png >=20 > Once memory pressure reaches a point where the heap can't be = maintained reliably below 75%, cassandra goes into survival mode - via a = bunch of tunables in cassandra.conf it'll do things like flush = memtables, drop caches, etc - all of which, in my experience, especially = with the recent off-heap data structures, exasperate the problem. >=20 > I've been meaning, of course, to collect enough technical data to file = a bug report, but haven't had the time. I have not yet tested 1.1.1 to = see if it improves the situation. >=20 > What I have found however, is a band-aid which you see at the = rightmost section of the graph in the screenshot I posted. That is = simply to hit "Perform GC" button in jconsole. It seems that a full = System.gc() *DOES* reclaim heap memory that parnew and CMS fail to = reclaim. >=20 > On my production cluster I have a full-GC via JMX scheduled in a = rolling fashion every 4 hours. It's extremely expensive (20-40 seconds = of unresponsiveness) but is a necessary evil in my situation. Without = it, my nodes enter a nasty spiral of constant flushing, constant = compactions, high heap usage, instability and high latency. >=20 >=20 > On 2012-06-05, at 2:56 PM, Poziombka, Wade L wrote: >=20 >> Alas, upgrading to 1.1.1 did not solve my issue. >>=20 >> -----Original Message----- >> From: Brandon Williams [mailto:driftx@gmail.com] >> Sent: Monday, June 04, 2012 11:24 PM >> To: user@cassandra.apache.org >> Subject: Re: memory issue on 1.1.0 >>=20 >> Perhaps the deletes:=20 >> https://issues.apache.org/jira/browse/CASSANDRA-3741 >>=20 >> -Brandon >>=20 >> On Sun, Jun 3, 2012 at 6:12 PM, Poziombka, Wade L = wrote: >>> Running a very write intensive (new column, delete old column etc.) = process and failing on memory. Log file attached. >>>=20 >>> Curiously when I add new data I have never seen this have in past=20 >>> sent hundreds of millions "new" transactions. It seems to be when I=20= >>> modify. my process is as follows >>>=20 >>> key slice to get columns to modify in batches of 100, in separate = threads modify those columns. I advance the slice with the start key = each with last key in previous batch. Mutations done are update a = column value in one column family(token), delete column and add new = column in another (pan). >>>=20 >>> Runs well until after about 5 million rows then it seems to run out = of memory. Note that these column families are quite small. >>>=20 >>> WARN [ScheduledTasks:1] 2012-06-03 17:49:01,558 GCInspector.java=20 >>> (line >>> 145) Heap is 0.7967470834946492 full. You may need to reduce=20 >>> memtable and/or cache sizes. Cassandra will now flush up to the two=20= >>> largest memtables to free up memory. Adjust=20 >>> flush_largest_memtables_at threshold in cassandra.yaml if you don't=20= >>> want Cassandra to do this automatically INFO [ScheduledTasks:1]=20 >>> 2012-06-03 17:49:01,559 StorageService.java (line 2772) Unable to=20 >>> reduce heap usage since there are no dirty column families INFO=20 >>> [GossipStage:1] 2012-06-03 17:49:01,999 Gossiper.java (line 797)=20 >>> InetAddress /10.230.34.170 is now UP INFO [ScheduledTasks:1]=20 >>> 2012-06-03 17:49:10,048 GCInspector.java (line 122) GC for ParNew:=20= >>> 206 ms for 1 collections, 7345969520 used; max is 8506048512 INFO=20= >>> [ScheduledTasks:1] 2012-06-03 17:49:53,187 GCInspector.java (line=20 >>> 122) GC for ConcurrentMarkSweep: 12770 ms for 1 collections, >>> 5714800208 used; max is 8506048512 >>>=20 >>> ---------------- >>> Keyspace: keyspace >>> Read Count: 50042632 >>> Read Latency: 0.23157864418482224 ms. >>> Write Count: 44948323 >>> Write Latency: 0.019460829472992797 ms. >>> Pending Tasks: 0 >>> Column Family: pan >>> SSTable count: 5 >>> Space used (live): 1977467326 >>> Space used (total): 1977467326 >>> Number of Keys (estimate): 16334848 >>> Memtable Columns Count: 0 >>> Memtable Data Size: 0 >>> Memtable Switch Count: 74 >>> Read Count: 14985122 >>> Read Latency: 0.408 ms. >>> Write Count: 19972441 >>> Write Latency: 0.022 ms. >>> Pending Tasks: 0 >>> Bloom Filter False Postives: 829 >>> Bloom Filter False Ratio: 0.00073 >>> Bloom Filter Space Used: 37048400 >>> Compacted row minimum size: 125 >>> Compacted row maximum size: 149 >>> Compacted row mean size: 149 >>>=20 >>> Column Family: token >>> SSTable count: 4 >>> Space used (live): 1250973873 >>> Space used (total): 1250973873 >>> Number of Keys (estimate): 14217216 >>> Memtable Columns Count: 0 >>> Memtable Data Size: 0 >>> Memtable Switch Count: 49 >>> Read Count: 30059563 >>> Read Latency: 0.167 ms. >>> Write Count: 14985488 >>> Write Latency: 0.014 ms. >>> Pending Tasks: 0 >>> Bloom Filter False Postives: 13642 >>> Bloom Filter False Ratio: 0.00322 >>> Bloom Filter Space Used: 28002984 >>> Compacted row minimum size: 150 >>> Compacted row maximum size: 258 >>> Compacted row mean size: 224 >>>=20 >>> Column Family: counters >>> SSTable count: 2 >>> Space used (live): 561549994 >>> Space used (total): 561549994 >>> Number of Keys (estimate): 9985024 >>> Memtable Columns Count: 0 >>> Memtable Data Size: 0 >>> Memtable Switch Count: 38 >>> Read Count: 4997947 >>> Read Latency: 0.092 ms. >>> Write Count: 9990394 >>> Write Latency: 0.023 ms. >>> Pending Tasks: 0 >>> Bloom Filter False Postives: 191 >>> Bloom Filter False Ratio: 0.37525 >>> Bloom Filter Space Used: 18741152 >>> Compacted row minimum size: 125 >>> Compacted row maximum size: 179 >>> Compacted row mean size: 150 >>>=20 >>> ---------------- >=20 --Apple-Mail=_217423A3-8AF0-48D9-A80E-94F19D4D0B54 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
http://www.thelastpickle.com

On 6/06/2012, at 2:12 PM, Poziombka, Wade L = wrote:

Ok, so I have completely refactored to remove deletes = and it still fails. So it is completely unrelated to deletes.

I = guess I need to go back to 1.0.10?  When I originally evaluated I = ran 1.0.8... perhaps I went a bridge too far with 1.1.

I don't = think I am doing anything exotic here.

Here is my column family. =  

KsDef(name:TB_UNIT, = strategy_class:org.apache.cassandra.locator.SimpleStrategy, = strategy_options:{replication_factor=3D3}, =
cf_defs:[

CfDef(keyspace:TB_UNIT, name:token, = column_type:Standard, comparator_type:BytesType, = column_metadata:[ColumnDef(name:70 61 6E 45 6E 63, = validation_class:BytesType), ColumnDef(name:63 72 65 61 74 65 54 73, = validation_class:DateType), ColumnDef(name:63 72 65 61 74 65 44 61 74 = 65, validation_class:DateType, index_type:KEYS, = index_name:TokenCreateDate), ColumnDef(name:65 6E 63 72 79 70 74 69 6F = 6E 53 65 74 74 69 6E 67 73 49 44, validation_class:UTF8Type, = index_type:KEYS, index_name:EncryptionSettingsID)], caching:keys_only), =

CfDef(keyspace:TB_UNIT, = name:pan_d721fd40fd9443aa81cc6f59c8e047c6, column_type:Standard, = comparator_type:BytesType, caching:keys_only), =

CfDef(keyspace:TB_UNIT, name:counters, column_type:Standard, = comparator_type:BytesType, column_metadata:[ColumnDef(name:75 73 65 43 = 6F 75 6E 74, validation_class:CounterColumnType)], = default_validation_class:CounterColumnType, = caching:keys_only)

])


-----Original = Message-----
From: Poziombka, Wade L = [mailto:wade.l.poziombka@intel.com]
Sent: Tuesday, June 05, 2012 = 3:09 PM
To: user@cassandra.apache.orgSubject: RE: memory issue on 1.1.0

Thank you.  I do have = some of the same observations.  Do you do deletes?

My = observation is that without deletes (or column updates I guess) I can = run forever happy.  but when I run (what for me is a batch process) = operations that delete and modify column values I run into = this.

Reading bug https://issu= es.apache.org/jira/browse/CASSANDRA-3741 the advice is to NOT do = deletes individually and to truncate.  I am scrambling to try to do = this but curious if it will be worth the = effort.

Wade

-----Original Message-----
From: Mina = Naguib [mailto:mina.naguib@bloomdigital.com]
Sent: Tuesday, June 05, = 2012 2:24 PM
To: user@cassandra.apache.orgSubject: Re: memory issue on 1.1.0


Hi Wade

I don't = know if your scenario matches mine, but I've been struggling with memory = pressure in 1.x as well.  I made the jump from 0.7.9 to 1.1.0, = along with enabling compression and levelled compactions, so I don't = know which specifically is the main culprit.

Specifically, all my = nodes seem to "lose" heap memory.  As parnew and CMS do their job, = over any reasonable period of time, the "floor" of memory after a GC = keeps rising.  This is quite visible if you leave jconsole = connected for a day or so, and manifests itself as a funny-looking cone = like so: http://mina.n= aguib.ca/images/cassandra_jconsole.png

Once memory pressure = reaches a point where the heap can't be maintained reliably below 75%, = cassandra goes into survival mode - via a bunch of tunables in = cassandra.conf it'll do things like flush memtables, drop caches, etc - = all of which, in my experience, especially with the recent off-heap data = structures, exasperate the problem.

I've been meaning, of course, = to collect enough technical data to file a bug report, but haven't had = the time.  I have not yet tested 1.1.1 to see if it improves the = situation.

What I have found however, is a band-aid which you see = at the rightmost section of the graph in the screenshot I posted. =  That is simply to hit "Perform GC" button in jconsole.  It = seems that a full System.gc() *DOES* reclaim heap memory that parnew and = CMS fail to reclaim.

On my production cluster I have a full-GC = via JMX scheduled in a rolling fashion every 4 hours.  It's = extremely expensive (20-40 seconds of unresponsiveness) but is a = necessary evil in my situation.  Without it, my nodes enter a nasty = spiral of constant flushing, constant compactions, high heap usage, = instability and high latency.


On 2012-06-05, at 2:56 PM, = Poziombka, Wade L wrote:

Alas, = upgrading to 1.1.1 did not solve my issue.

-----Original = Message-----
From: Brandon = Williams [mailto:driftx@gmail.com]
Sent: Monday, June 04, 2012 11:24 = PM
To: user@cassandra.apache.org
Subject: Re: memory issue on = 1.1.0

Perhaps the = deletes:
https://issu= es.apache.org/jira/browse/CASSANDRA-3741

-Brandon

On Sun, Jun 3, = 2012 at 6:12 PM, Poziombka, Wade L <wade.l.poziombka@intel.com&= gt; wrote:
Running a very write intensive (new column, delete old = column etc.) process and failing on memory.  Log file = attached.

Curiously when I add new data I = have never seen this have in past =
sent hundreds of millions "new" transactions.  It = seems to be when I
modify.  my process is as = follows

key slice to get columns to = modify in batches of 100, in separate threads modify those columns. =  I advance the slice with the start key each with last key in = previous batch.  Mutations done are update a column value in one = column family(token), delete column and add new column in another = (pan).

Runs well until after about 5 = million rows then it seems to run out of memory.  Note that these = column families are quite = small.

WARN [ScheduledTasks:1] = 2012-06-03 17:49:01,558 GCInspector.java =
(line
145) Heap is 0.7967470834946492 = full.  You may need to reduce =
memtable and/or cache sizes.  Cassandra will now = flush up to the two
largest memtables to free up = memory.  Adjust
flush_largest_memtables_at = threshold in cassandra.yaml if you don't =
want Cassandra to do this automatically  INFO = [ScheduledTasks:1]
2012-06-03 17:49:01,559 = StorageService.java (line 2772) Unable to =
reduce heap usage since there are no dirty column families =  INFO
[GossipStage:1] 2012-06-03 = 17:49:01,999 Gossiper.java (line 797) =
InetAddress /10.230.34.170 is now UP  INFO = [ScheduledTasks:1]
2012-06-03 17:49:10,048 = GCInspector.java (line 122) GC for ParNew: =
206 ms for 1 collections, 7345969520 used; max is = 8506048512  INFO
[ScheduledTasks:1] 2012-06-03 = 17:49:53,187 GCInspector.java (line =
122) GC for ConcurrentMarkSweep: 12770 ms for 1 = collections,
5714800208 used; max is = 8506048512

----------------
Keyspace: = keyspace
=       Read Count: = 50042632
=       Read Latency: 0.23157864418482224 = ms.
      Write Count: = 44948323
=       Write Latency: 0.019460829472992797 = ms.
      Pending Tasks: = 0
=             &n= bsp; Column Family: pan
=             &n= bsp; SSTable count: 5
=             &n= bsp; Space used (live): = 1977467326
=             &n= bsp; Space used (total): = 1977467326
=             &n= bsp; Number of Keys (estimate): = 16334848
=             &n= bsp; Memtable Columns Count: = 0
=             &n= bsp; Memtable Data Size: 0
=             &n= bsp; Memtable Switch Count: = 74
=             &n= bsp; Read Count: 14985122
=             &n= bsp; Read Latency: 0.408 = ms.
=             &n= bsp; Write Count: 19972441
=             &n= bsp; Write Latency: 0.022 = ms.
=             &n= bsp; Pending Tasks: 0
=             &n= bsp; Bloom Filter False Postives: = 829
=             &n= bsp; Bloom Filter False Ratio: = 0.00073
=             &n= bsp; Bloom Filter Space Used: = 37048400
=             &n= bsp; Compacted row minimum size: = 125
=             &n= bsp; Compacted row maximum size: = 149
=             &n= bsp; Compacted row mean size: = 149

=             &n= bsp; Column Family: token
=             &n= bsp; SSTable count: 4
=             &n= bsp; Space used (live): = 1250973873
=             &n= bsp; Space used (total): = 1250973873
=             &n= bsp; Number of Keys (estimate): = 14217216
=             &n= bsp; Memtable Columns Count: = 0
=             &n= bsp; Memtable Data Size: 0
=             &n= bsp; Memtable Switch Count: = 49
=             &n= bsp; Read Count: 30059563
=             &n= bsp; Read Latency: 0.167 = ms.
=             &n= bsp; Write Count: 14985488
=             &n= bsp; Write Latency: 0.014 = ms.
=             &n= bsp; Pending Tasks: 0
=             &n= bsp; Bloom Filter False Postives: = 13642
=             &n= bsp; Bloom Filter False Ratio: = 0.00322
=             &n= bsp; Bloom Filter Space Used: = 28002984
=             &n= bsp; Compacted row minimum size: = 150
=             &n= bsp; Compacted row maximum size: = 258
=             &n= bsp; Compacted row mean size: = 224

=             &n= bsp; Column Family: = counters
=             &n= bsp; SSTable count: 2
=             &n= bsp; Space used (live): = 561549994
=             &n= bsp; Space used (total): = 561549994
=             &n= bsp; Number of Keys (estimate): = 9985024
=             &n= bsp; Memtable Columns Count: = 0
=             &n= bsp; Memtable Data Size: 0
=             &n= bsp; Memtable Switch Count: = 38
=             &n= bsp; Read Count: 4997947
=             &n= bsp; Read Latency: 0.092 = ms.
=             &n= bsp; Write Count: 9990394
=             &n= bsp; Write Latency: 0.023 = ms.
=             &n= bsp; Pending Tasks: 0
=             &n= bsp; Bloom Filter False Postives: = 191
=             &n= bsp; Bloom Filter False Ratio: = 0.37525
=             &n= bsp; Bloom Filter Space Used: = 18741152
=             &n= bsp; Compacted row minimum size: = 125
=             &n= bsp; Compacted row maximum size: = 179
=             &n= bsp; Compacted row mean size: = 150

----------------


= --Apple-Mail=_217423A3-8AF0-48D9-A80E-94F19D4D0B54--