Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9505710585 for ; Tue, 16 Dec 2014 19:06:55 +0000 (UTC) Received: (qmail 69700 invoked by uid 500); 16 Dec 2014 19:06:52 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 69653 invoked by uid 500); 16 Dec 2014 19:06:52 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 69642 invoked by uid 99); 16 Dec 2014 19:06:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Dec 2014 19:06:52 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of arne@emotient.com designates 209.85.214.196 as permitted sender) Received: from [209.85.214.196] (HELO mail-ob0-f196.google.com) (209.85.214.196) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Dec 2014 19:06:25 +0000 Received: by mail-ob0-f196.google.com with SMTP id wp4so7540obc.3 for ; Tue, 16 Dec 2014 11:04:53 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:date:message-id:subject:from:to :content-type; bh=qVWRMAHW3QAq1+ok7zYlCXTIxGDXGQtNhOaIBgXTsLY=; b=WrUabj9bCirnRGxXFSnCR/8yfx2ghTUUUFYqjn4u+PkLeDfwMoSNbP8XAqcV7eI5WF UYxTBODfWLAxQ4+KCYVAzbd84rvORgS4kIw4EVAkkjZkE0LkbLWgmpTUBLzSUDp0GJDc E/g54EQ1JrHStAaAWbRZ9aZJve7opTKddqOw+vk3fE4VPiKqmJwBvobnOqBZ1Fihfz9x zsHgyazbWML+D+maZfYW4ZAMW/NJOq7yw/v19NPIYPCJiazu8yr9dXAj2ByNpJvhTp/0 kiZjpO6BQm/lVTkHUWVtgSkXlv87EHz/MFA1zVv/2nElpCfsFBbu5k3CeXD4RrubTLO/ yEgg== X-Gm-Message-State: ALoCoQnGW9Nvx6LJJ3NJ3TFOpaZzFCANErtHo2D7HcYuphP4Gi4+HUq+sELCwyiHNlHyEurrZjZS MIME-Version: 1.0 X-Received: by 10.60.139.66 with SMTP id qw2mr23399580oeb.11.1418756693806; Tue, 16 Dec 2014 11:04:53 -0800 (PST) Received: by 10.76.37.164 with HTTP; Tue, 16 Dec 2014 11:04:53 -0800 (PST) Date: Tue, 16 Dec 2014 11:04:53 -0800 Message-ID: Subject: 100% CPU utilization, ParNew and never completing compactions From: Arne Claassen To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=047d7b472448c1e8fd050a5a081f X-Virus-Checked: Checked by ClamAV on apache.org --047d7b472448c1e8fd050a5a081f Content-Type: text/plain; charset=UTF-8 I have a three node cluster that has been sitting at a load of 4 (for each node), 100% CPI utilization (although 92% nice) for that last 12 hours, ever since some significant writes finished. I'm trying to determine what tuning I should be doing to get it out of this state. The debug log is just an endless series of: DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 8000634880 DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is 8000634880 iostat shows virtually no I/O. Compaction may enter into this, but i don't really know what to make of compaction stats since they never change: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 10 compaction type keyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes 48.20% Compaction mediamedia_tracks_raw 30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes 66.00% Active compaction remaining time : 0h22m24s 5 minutes later: [root@cassandra-37919c3a ~]# nodetool compactionstats pending tasks: 9 compaction type keyspace table completed total unit progress Compaction mediamedia_tracks_raw 271651482 563615497 bytes 48.20% Compaction mediamedia_tracks_raw 30308910 21676695677 bytes 0.14% Compaction mediamedia_tracks_raw 1198384080 1815603161 bytes 66.00% Active compaction remaining time : 0h22m24s Sure the pending tasks went down by one, but the rest is identical. media_tracks_raw likely has a bunch of tombstones (can't figure out how to get stats on that). Is this behavior something that indicates that i need more Heap, larger new generation? Should I be manually running compaction on tables with lots of tombstones? Any suggestions or places to educate myself better on performance tuning would be appreciated. arne --047d7b472448c1e8fd050a5a081f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I have a three node cluster that has been sitting at a loa= d of 4 (for each node), 100% CPI utilization (although 92% nice) for that l= ast 12 hours, ever since some significant writes finished. I'm trying t= o determine what tuning I should be doing to get it out of this state. The = debug log is just an endless series of:

DEBUG [Sche= duledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for Pa= rNew: 166 ms for 10 collections, 4400928736 used; max is 8000634880
DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line = 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 8000= 634880
DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspect= or.java (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used= ; max is 8000634880

iostat shows virtually no I/O.=

Compaction may enter into this, but i don't r= eally know what to make of compaction stats since they never change:
<= div>
[root@cassandra-37919c3a ~]# nodetool compactionsta= ts
pending tasks: 10
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= compaction type =C2=A0 =C2=A0 =C2=A0 =C2=A0keyspace =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 table =C2=A0 =C2=A0 =C2=A0 completed =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 total =C2=A0 =C2=A0 =C2=A0unit =C2=A0progress
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Compaction =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 mediamedia_tracks_raw =C2=A0 =C2=A0 =C2=A0 271651482 = =C2=A0 =C2=A0 =C2=A0 563615497 =C2=A0 =C2=A0 bytes =C2=A0 =C2=A048.20%
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Compaction =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mediamedia_tracks_raw =C2=A0 =C2=A0 =C2=A0 = =C2=A030308910 =C2=A0 =C2=A0 21676695677 =C2=A0 =C2=A0 bytes =C2=A0 =C2=A0 = 0.14%
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Comp= action =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mediamedia_tracks_raw =C2=A0 =C2= =A0 =C2=A01198384080 =C2=A0 =C2=A0 =C2=A01815603161 =C2=A0 =C2=A0 bytes =C2= =A0 =C2=A066.00%
Active compaction remaining time : =C2=A0 0h22m2= 4s

5 minutes later:

[root= @cassandra-37919c3a ~]# nodetool compactionstats
pending tasks: 9=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 compaction type =C2=A0 =C2=A0= =C2=A0 =C2=A0keyspace =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 table =C2=A0 =C2= =A0 =C2=A0 completed =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 total =C2=A0 =C2=A0= =C2=A0unit =C2=A0progress
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0Compaction =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mediamedia_t= racks_raw =C2=A0 =C2=A0 =C2=A0 271651482 =C2=A0 =C2=A0 =C2=A0 563615497 =C2= =A0 =C2=A0 bytes =C2=A0 =C2=A048.20%
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0Compaction =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 me= diamedia_tracks_raw =C2=A0 =C2=A0 =C2=A0 =C2=A030308910 =C2=A0 =C2=A0 21676= 695677 =C2=A0 =C2=A0 bytes =C2=A0 =C2=A0 0.14%
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Compaction =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 mediamedia_tracks_raw =C2=A0 =C2=A0 =C2=A01198384080 =C2=A0 =C2= =A0 =C2=A01815603161 =C2=A0 =C2=A0 bytes =C2=A0 =C2=A066.00%
Acti= ve compaction remaining time : =C2=A0 0h22m24s

Sure the pending tasks went down by one, but the rest is identical. medi= a_tracks_raw likely has a bunch of tombstones (can't figure out how to = get stats on that).

Is this behavior something tha= t indicates that i need more Heap, larger new generation? Should I be manua= lly running compaction on tables with lots of tombstones?
<= br>
Any suggestions or places to educate myself better on perform= ance tuning would be appreciated.

arne
--047d7b472448c1e8fd050a5a081f--