Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of arne@emotient.com designates
 209.85.214.196 as permitted sender)
MIME-Version: 1.0
Date: Tue, 16 Dec 2014 11:04:53 -0800
Message-ID: 
 <CAC8=97fugafn4Xhx3d1D1xwPVb_p4r7uWZygXrOKwxynZYphDQ@mail.gmail.com>
Subject: 100% CPU utilization, ParNew and never completing compactions
From: Arne Claassen <arne@emotient.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=047d7b472448c1e8fd050a5a081f

--047d7b472448c1e8fd050a5a081f
Content-Type: text/plain; charset=UTF-8

I have a three node cluster that has been sitting at a load of 4 (for each
node), 100% CPI utilization (although 92% nice) for that last 12 hours,
ever since some significant writes finished. I'm trying to determine what
tuning I should be doing to get it out of this state. The debug log is just
an endless series of:

DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
8000634880
DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line
118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is
8000634880
DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line
118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is
8000634880

iostat shows virtually no I/O.

Compaction may enter into this, but i don't really know what to make of
compaction stats since they never change:

[root@cassandra-37919c3a ~]# nodetool compactionstats
pending tasks: 10
          compaction type        keyspace           table       completed
        total      unit  progress
               Compaction           mediamedia_tracks_raw       271651482
    563615497     bytes    48.20%
               Compaction           mediamedia_tracks_raw        30308910
  21676695677     bytes     0.14%
               Compaction           mediamedia_tracks_raw      1198384080
   1815603161     bytes    66.00%
Active compaction remaining time :   0h22m24s

5 minutes later:

[root@cassandra-37919c3a ~]# nodetool compactionstats
pending tasks: 9
          compaction type        keyspace           table       completed
        total      unit  progress
               Compaction           mediamedia_tracks_raw       271651482
    563615497     bytes    48.20%
               Compaction           mediamedia_tracks_raw        30308910
  21676695677     bytes     0.14%
               Compaction           mediamedia_tracks_raw      1198384080
   1815603161     bytes    66.00%
Active compaction remaining time :   0h22m24s

Sure the pending tasks went down by one, but the rest is identical.
media_tracks_raw likely has a bunch of tombstones (can't figure out how to
get stats on that).

Is this behavior something that indicates that i need more Heap, larger new
generation? Should I be manually running compaction on tables with lots of
tombstones?

Any suggestions or places to educate myself better on performance tuning
would be appreciated.

arne

--047d7b472448c1e8fd050a5a081f
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I have a three node cluster that has been sitting at a loa=
d of 4 (for each node), 100% CPI utilization (although 92% nice) for that l=
ast 12 hours, ever since some significant writes finished. I&#39;m trying t=
o determine what tuning I should be doing to get it out of this state. The =
debug log is just an endless series of:<div><br></div><div><div>DEBUG [Sche=
duledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC for Pa=
rNew: 166 ms for 10 collections, 4400928736 used; max is 8000634880</div><d=
iv>DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line =
118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 8000=
634880</div><div>DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspect=
or.java (line 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used=
; max is 8000634880</div><div><br></div><div>iostat shows virtually no I/O.=
</div><div><br></div><div>Compaction may enter into this, but i don&#39;t r=
eally know what to make of compaction stats since they never change:</div><=
div><br></div><div><div>[root@cassandra-37919c3a ~]# nodetool compactionsta=
ts</div><div>pending tasks: 10</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 compaction type =C2=A0 =C2=A0 =C2=A0 =C2=A0keyspace =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 table =C2=A0 =C2=A0 =C2=A0 completed =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 total =C2=A0 =C2=A0 =C2=A0unit =C2=A0progress</div><div>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Compaction =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 mediamedia_tracks_raw =C2=A0 =C2=A0 =C2=A0 271651482 =
=C2=A0 =C2=A0 =C2=A0 563615497 =C2=A0 =C2=A0 bytes =C2=A0 =C2=A048.20%</div=
><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Compaction =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mediamedia_tracks_raw =C2=A0 =C2=A0 =C2=A0 =
=C2=A030308910 =C2=A0 =C2=A0 21676695677 =C2=A0 =C2=A0 bytes =C2=A0 =C2=A0 =
0.14%</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Comp=
action =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mediamedia_tracks_raw =C2=A0 =C2=
=A0 =C2=A01198384080 =C2=A0 =C2=A0 =C2=A01815603161 =C2=A0 =C2=A0 bytes =C2=
=A0 =C2=A066.00%</div><div>Active compaction remaining time : =C2=A0 0h22m2=
4s</div><div><br></div><div>5 minutes later:</div><div><br></div><div>[root=
@cassandra-37919c3a ~]# nodetool compactionstats</div><div>pending tasks: 9=
</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 compaction type =C2=A0 =C2=A0=
 =C2=A0 =C2=A0keyspace =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 table =C2=A0 =C2=
=A0 =C2=A0 completed =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 total =C2=A0 =C2=A0=
 =C2=A0unit =C2=A0progress</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0Compaction =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 mediamedia_t=
racks_raw =C2=A0 =C2=A0 =C2=A0 271651482 =C2=A0 =C2=A0 =C2=A0 563615497 =C2=
=A0 =C2=A0 bytes =C2=A0 =C2=A048.20%</div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0Compaction =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 me=
diamedia_tracks_raw =C2=A0 =C2=A0 =C2=A0 =C2=A030308910 =C2=A0 =C2=A0 21676=
695677 =C2=A0 =C2=A0 bytes =C2=A0 =C2=A0 0.14%</div><div>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Compaction =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 mediamedia_tracks_raw =C2=A0 =C2=A0 =C2=A01198384080 =C2=A0 =C2=
=A0 =C2=A01815603161 =C2=A0 =C2=A0 bytes =C2=A0 =C2=A066.00%</div><div>Acti=
ve compaction remaining time : =C2=A0 0h22m24s</div></div><div><br></div><d=
iv>Sure the pending tasks went down by one, but the rest is identical. medi=
a_tracks_raw likely has a bunch of tombstones (can&#39;t figure out how to =
get stats on that).</div><div><br></div><div>Is this behavior something tha=
t indicates that i need more Heap, larger new generation? Should I be manua=
lly running compaction on tables with lots of tombstones?</div></div><div><=
br></div><div>Any suggestions or places to educate myself better on perform=
ance tuning would be appreciated.</div><div><br></div><div>arne</div></div>

--047d7b472448c1e8fd050a5a081f--