Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0A259B42D for ; Mon, 9 Jan 2012 00:26:07 +0000 (UTC) Received: (qmail 22018 invoked by uid 500); 9 Jan 2012 00:26:04 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 21898 invoked by uid 500); 9 Jan 2012 00:26:03 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 21868 invoked by uid 99); 9 Jan 2012 00:26:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Jan 2012 00:26:03 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=HTML_FONT_SIZE_LARGE,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of caleb@steelhouse.com does not designate 66.46.182.58 as permitted sender) Received: from [66.46.182.58] (HELO relay.ihostexchange.net) (66.46.182.58) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Jan 2012 00:25:56 +0000 Received: from VMBX131.ihostexchange.net ([192.168.40.21]) by hub108.ihostexchange.net ([66.46.182.58]) with mapi; Sun, 8 Jan 2012 19:25:35 -0500 From: Caleb Rackliffe To: "user@cassandra.apache.org" CC: "aaron@thelastpickle.com" Date: Sun, 8 Jan 2012 19:25:33 -0500 Subject: Re: Lots and Lots of CompactionReducer Threads Thread-Topic: Lots and Lots of CompactionReducer Threads Thread-Index: AczOZTRjeHfZvo6yRnmh0rKlCAeASg== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.14.0.111121 acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_CB2F73BE6CF8calebsteelhousecom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_CB2F73BE6CF8calebsteelhousecom_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable After some searching, I think I may have found something in the code itself= , and so I've filed a big report - https://issues.apache.org/jira/browse/CA= SSANDRA-3711 Caleb Rackliffe | Software Developer M 949.981.0159 | caleb@steelhouse.com From: Caleb Rackliffe > Reply-To: "user@cassandra.apache.org" > Date: Sun, 8 Jan 2012 17:48:59 -0500 To: "user@cassandra.apache.org" > Cc: "aaron@thelastpickle.com" > Subject: Re: Lots and Lots of CompactionReducer Threads With the exception of a few little warnings on start-up about the Memtable = live ratio, there is nothing at WARN or above in the logs. Just before the= JVM terminates, there are about 10,000 threads in Reducer executor pools t= hat look like this in JConsole =85 Name: CompactionReducer:1 State: TIMED_WAITING on java.util.concurrent.SynchronousQueue$TransferStack= @72938aea Total blocked: 0 Total waited: 1 Stack trace: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226) java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(Synchronou= sQueue.java:460) java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQue= ue.java:359) java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942) java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:104= 3) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1= 103) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:= 603) java.lang.Thread.run(Thread.java:722) The results from tpstats don't look too interesting=85 Pool Name Active Pending Completed Blocked All= time blocked ReadStage 0 0 3455159 0 = 0 RequestResponseStage 0 0 10133276 0 = 0 MutationStage 0 0 5898833 0 = 0 ReadRepairStage 0 0 2078449 0 = 0 ReplicateOnWriteStage 0 0 0 0 = 0 GossipStage 0 0 236388 0 = 0 AntiEntropyStage 0 0 0 0 = 0 MigrationStage 0 0 0 0 = 0 MemtablePostFlusher 0 0 231 0 = 0 StreamStage 0 0 0 0 = 0 FlushWriter 0 0 231 0 = 0 MiscStage 0 0 0 0 = 0 InternalResponseStage 0 0 0 0 = 0 HintedHandoff 0 0 35 0 = 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 0 BINARY 0 READ 0 MUTATION 0 REQUEST_RESPONSE 0 The results from info seem unremarkable as well=85 Token : 153127065000000000000000000000000000000 Gossip active : true Load : 5.6 GB Generation No : 1325995515 Uptime (seconds) : 67199 Heap Memory (MB) : 970.32 / 1968.00 Data Center : datacenter1 Rack : rack1 Exceptions : 0 I'm using LeveledCompactionStrategy with no throttling, and I'm not changin= g the default on the number of concurrent compactors. What is interesting to me here is that Cassandra creates an executor for ev= ery single compaction in ParallelCompactionIterable. Why couldn't we just = create a pool with Runtime.availableProcessors() Threads and be done with i= t? Let me know if I left any info out. Thanks! Caleb Rackliffe | Software Developer M 949.981.0159 | caleb@steelhouse.com From: aaron morton = > Reply-To: "user@cassandra.apache.org" > Date: Sun, 8 Jan 2012 16:51:50 -0500 To: "user@cassandra.apache.org" > Subject: Re: Lots and Lots of CompactionReducer Threads How many threads ? Any errors in the server logs ? What does noodtool tpstats and nodetool compactionstats say ? Did you change compaction_strategy for the CF's ? By default cassandra will use as many compaction threads as you have cores,= see concurrent_compactors in cassandra.yaml Have you set the JVM heap settings ? What does nodetool info show ? Hope that helps. ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 8/01/2012, at 3:51 PM, Caleb Rackliffe wrote: Hi Everybody, JConsole tells me I've got CompactionReducer threads stacking up, consuming= memory, and never going away. Eventually, my Java process fails because i= t can't allocate any more native threads. Here's my setup=85 Cassandra 1.0.5 on CentOS 6.0 4 GB of RAM 50 GB SSD HD Memtable flush threshold =3D 128 MB compaction throughput limit =3D 16 MB/sec Multithreaded compaction =3D true It may very well be that I'm doing something strange here, but it seems lik= e those compaction threads should go away eventually. I'm hoping the combi= nation of a low Memtable flush threshold, low compaction T/P limit, and hea= vy write load doesn't mean those threads are hanging around because they're= actually not done doing their compaction tasks. Thanks, Caleb Rackliffe | Software Developer M 949.981.0159 | caleb@steelhouse.com --_000_CB2F73BE6CF8calebsteelhousecom_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable
After some searching,= I think I may have found something in the code itself, and so I've filed a= big report - https://issues.apache.org/jira/browse/CASSANDRA-3711

Caleb Rackl= iffe | Software Developer
M 949.981.015= 9 | caleb@steelhouse.com

=

From: Ca= leb Rackliffe <caleb@steelhouse.= com>
Reply-To: "user@cassandra.apache.org&quo= t; <user@cassandra.apache.o= rg>
Date: Sun, 8 Jan 201= 2 17:48:59 -0500
To: "user@cassandra.apache.org"= ; <user@cassandra.apache.or= g>
Cc: "aaron@thelastpickle.com" <aaron@thelastpickle.com>
Subject: Re: Lots and Lots of Compac= tionReducer Threads

=
With the exception of a few little warnings on start-up= about the Memtable live ratio, there is nothing at WARN or above in the lo= gs.  Just before the JVM terminates, there are about 10,000 threads in= Reducer executor pools that look like this in JConsole =85

<= /div>

Name: CompactionReducer:1
= State: TIMED_WAITING on java.util.concurrent.SynchronousQueue$TransferSt= ack@72938aea
Total blocked: 0  Total waited: 1

Stack trace: 
 = sun.misc.Unsafe.park(Native Method)
java.util.concurrent.l= ocks.LockSupport.parkNanos(LockSupport.java:226)
java.util= .concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.ja= va:460)
java.util.concurrent.SynchronousQueue$TransferStac= k.transfer(SynchronousQueue.java:359)
java.util.concurrent= .SynchronousQueue.poll(SynchronousQueue.java:942)
java.uti= l.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1043)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExe= cutor.java:1103)
java.util.concurrent.ThreadPoolExecutor$W= orker.run(ThreadPoolExecutor.java:603)
java.lang.Thread.ru= n(Thread.java:722)


The results from tpstats don't l= ook too interesting=85

Pool Name    = ;                Active   Pend= ing      Completed   Blocked  All time blocked
ReadStage                 &n= bsp;       0         0     &nb= sp;  3455159         0        =         0
RequestResponseStage    =          0         0   &= nbsp;   10133276         0       &n= bsp;         0
MutationStage     &n= bsp;               0      = ;   0        5898833         0=                 0
ReadRe= pairStage                   0 =         0        2078449   &nb= sp;     0                = 0
ReplicateOnWriteStage            = ; 0         0            =  0         0           &= nbsp;     0
GossipStage         &nb= sp;             0        = 0         236388         0   =               0
AntiEntropySta= ge                  0   &= nbsp;     0              0 &nb= sp;       0              =   0
MigrationStage            = ;        0         0     =          0         0   &n= bsp;             0
MemtablePostFlus= her               0      =   0            231      =   0                 0
StreamStage                 &nbs= p;     0         0        = ;      0         0       =           0
FlushWriter     &n= bsp;                 0    = ;     0            231    = ;     0                 0=
MiscStage                = ;         0         0    =          0         0   &= nbsp;             0
InternalRespons= eStage             0       &nb= sp; 0              0      = ;   0                 0
<= div>HintedHandoff                 &= nbsp;   0         0         &n= bsp;   35         0         &n= bsp;       0

Message type   &n= bsp;       Dropped
RANGE_SLICE      = ;            0
READ_REPAIR   &= nbsp;              0
BINARY &n= bsp;                     = 0
READ                 &n= bsp;       0
MUTATION         =             0
REQUEST_RESPONSE &nbs= p;           0

The = results from info seem unremarkable as well=85

Token            : 153127065000000000000000= 000000000000000
Gossip active    : true
Load =             : 5.6 GB
Generation No =    : 1325995515
Uptime (seconds) : 67199
Heap= Memory (MB) : 970.32 / 1968.00
Data Center      := datacenter1
Rack             : rac= k1
Exceptions       : 0

=
I'm using LeveledCompactionStrategy with no throttling, and I'm not ch= anging the default on the number of concurrent compactors.

What is interesting to me here is that Cassandra creates an execut= or for every single compaction in ParallelCompactionIterable.  Why cou= ldn't we just create a pool with Runtime.availableProcessors() Threads and = be done with it?

Let me know if I left any info ou= t.

Thanks!
Caleb Rackliffe= | Software Developer
M 949.981.0159 | = caleb@steelhouse.com


From: aaron morton <aaron@thelastpickle.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Sun, 8 Jan 2012 16:51:50 -0500
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Lots and Lots of CompactionReducer Threads
=

How many thread= s ? Any errors in the server logs ? 

What does noo= dtool tpstats and nodetool compactionstats say ? 

D= id you change compaction_strategy for the CF's ? 

<= /div>
By default cassandra will use as many compaction threads as you h= ave cores, see concurrent_compactors in cassandra.yaml

<= /div>
Have you set the JVM heap settings ? What does nodetool info show= ? 

Hope that helps. 

-----------------
Aaron Morton
Freelance Developer
@aaronmorton

On 8/01/2012, at 3:= 51 PM, Caleb Rackliffe wrote:

=
Hi Every= body,

JConsole tells me I've got CompactionReducer= threads stacking up, consuming memory, and never going away.  Eventua= lly, my Java process fails because it can't allocate any more native thread= s.  Here's my setup=85

Cassandra 1.0.5 on Cen= tOS 6.0
4 GB of RAM
50 GB SSD HD
Memtable flu= sh threshold =3D 128 MB
compaction throughput limit =3D 16 MB/sec=
Multithreaded compaction =3D true

It ma= y very well be that I'm doing something strange here, but it seems like tho= se compaction threads should go away eventually.  I'm hoping the combi= nation of a low Memtable flush threshold, low compaction T/P limit, and hea= vy write load doesn't mean those threads are hanging around because they're= actually not done doing their compaction tasks.

T= hanks,

caleb@steelhous= e.com
=

=

--_000_CB2F73BE6CF8calebsteelhousecom_--