Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
From: Julien Anguenot <julien@anguenot.org>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_A3720B53-570D-47FE-80CB-4580CD73185B"
Message-Id: <233D643E-F1C7-4639-A3EE-C35C2CB98073@anguenot.org>
Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\))
Subject: Re: Cassandra eats all cpu cores, high load average
Date: Fri, 12 Feb 2016 11:24:14 -0600
References: <76FEA31B-37F9-4C3C-9CD0-5BE239D5C25C@skvazh.com>
 <CC4B1989-CCDC-46DB-BF66-5B1078AA16C6@anguenot.org>
 <3B833086-8F42-4F58-9ED6-2AD2255F1833@skvazh.com>
 <DF08F7C1-B8F9-4C23-A3FC-E5DA83AEC945@anguenot.org>
 <0ECE847E-45A0-4DDF-838F-207EF17477EF@skvazh.com>
 <09BBF630-71E1-4722-9A0B-571A91DEE03A@anguenot.org>
 <F0E72DD4-0031-48AB-B6B1-E04A5961758D@skvazh.com>
To: user@cassandra.apache.org
In-Reply-To: <F0E72DD4-0031-48AB-B6B1-E04A5961758D@skvazh.com>


--Apple-Mail=_A3720B53-570D-47FE-80CB-4580CD73185B
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

If you positive this is not compaction related I would:

   1. check disk IOPs and latency on the EBS volume. (dstat)
   2. turn GC log on in cassandra-env.sh  and use jstat to see what is =
happening to your HEAP.

I have been asking about compactions initially because if you having one =
(1) big table written by all nodes and fully replicated to all nodes in =
the cluster would definitely trigger constant compactions on every nodes =
depending on write throughput.

   J.=20

> On Feb 12, 2016, at 11:03 AM, Skvazh Roman <r@skvazh.com> wrote:
>=20
>> Does the load decrease and the node answers requests =E2=80=9Cnormally=E2=
=80=9D when you do disable auto-compaction? You actually see pending =
compactions on nodes having high load correct?
>=20
> Nope.
>=20
>> All seems legit here. Using G1 GC?
> Yes
>=20
> Problems also occurred on nodes without pending compactions.
>=20
>=20
>=20
>> On 12 Feb 2016, at 18:44, Julien Anguenot <julien@anguenot.org =
<mailto:julien@anguenot.org>> wrote:
>>=20
>>>=20
>>> On Feb 12, 2016, at 9:24 AM, Skvazh Roman <r@skvazh.com =
<mailto:r@skvazh.com>> wrote:
>>>=20
>>> I have disabled autocompaction and stop it on highload node.
>>=20
>> Does the load decrease and the node answers requests =E2=80=9Cnormally=E2=
=80=9D when you do disable auto-compaction? You actually see pending =
compactions on nodes having high load correct?
>>=20
>>> Heap is 8Gb. gc_grace is 86400
>>> All sstables is about 200-300 Mb.
>>=20
>> All seems legit here. Using G1 GC?
>>=20
>>> $ nodetool compactionstats
>>> pending tasks: 14
>>=20
>> Try to increase the compactors from 4 to 6-8 on a node, disable =
gossip and let it finish compacting and put it back in the ring by =
enabling gossip. See what happens.
>>=20
>> The tombstones count growing is because the auto-aucompactions are =
disabled on these nodes. Probably not your issue.
>>=20
>>    J.
>>=20
>>=20
>>>=20
>>> $ dstat -lvnr 10
>>> ---load-avg--- ---procs--- ------memory-usage----- ---paging-- =
-dsk/total- ---system-- ----total-cpu-usage---- -net/total- --io/total-
>>> 1m   5m  15m |run blk new| used  buff  cach  free|  in   out | read  =
writ| int   csw |usr sys idl wai hiq siq| recv  send| read  writ
>>> 29.4 28.6 23.5|0.0   0 1.2|11.3G  190M 17.6G  407M|   0     0 |7507k =
7330k|  13k   40k| 11   1  88   0   0   0|   0     0 |96.5  64.6
>>> 29.3 28.6 23.5| 29   0 0.9|11.3G  190M 17.6G  408M|   0     0 |   0  =
 189k|9822  2319 | 99   0   0   0   0   0| 138k  120k|   0  4.30
>>> 29.4 28.6 23.6| 30   0 2.0|11.3G  190M 17.6G  408M|   0     0 |   0  =
  26k|8689  2189 |100   0   0   0   0   0| 139k  120k|   0  2.70
>>> 29.4 28.7 23.6| 29   0 3.0|11.3G  190M 17.6G  408M|   0     0 |   0  =
  20k|8722  1846 | 99   0   0   0   0   0| 136k  120k|   0  1.50 ^C
>>>=20
>>>=20
>>> JvmTop 0.8.0 alpha - 15:20:37,  amd64, 16 cpus, Linux 3.14.44-3, =
load avg 28.09
>>> http://code.google.com/p/jvmtop <http://code.google.com/p/jvmtop>
>>>=20
>>> PID 32505: org.apache.cassandra.service.CassandraDaemon
>>> ARGS:
>>> VMARGS: -ea -javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar =
-XX:+CMSCl[...]
>>> VM: Oracle Corporation Java HotSpot(TM) 64-Bit Server VM 1.8.0_65
>>> UP:  8:31m  #THR: 334  #THRPEAK: 437  #THRCREATED: 4694 USER: =
cassandra
>>> GC-Time:  0: 8m   #GC-Runs: 6378      #TotalLoadedClasses: 5926
>>> CPU: 97.96% GC:  0.00% HEAP:6049m /7540m NONHEAP:  82m /  n/a
>>>=20
>>>  TID   NAME                                    STATE    CPU  =
TOTALCPU BLOCKEDBY
>>>    447 SharedPool-Worker-45                 RUNNABLE 60.47%     =
1.03%
>>>    343 SharedPool-Worker-2                  RUNNABLE 56.46%     =
3.07%
>>>    349 SharedPool-Worker-8                  RUNNABLE 56.43%     =
1.61%
>>>    456 SharedPool-Worker-25                 RUNNABLE 55.25%     =
1.06%
>>>    483 SharedPool-Worker-40                 RUNNABLE 53.06%     =
1.04%
>>>    475 SharedPool-Worker-53                 RUNNABLE 52.31%     =
1.03%
>>>    464 SharedPool-Worker-20                 RUNNABLE 52.00%     =
1.11%
>>>    577 SharedPool-Worker-71                 RUNNABLE 51.73%     =
1.02%
>>>    404 SharedPool-Worker-10                 RUNNABLE 51.10%     =
1.29%
>>>    486 SharedPool-Worker-34                 RUNNABLE 51.06%     =
1.03%
>>> Note: Only top 10 threads (according cpu load) are shown!
>>>=20
>>>=20
>>>> On 12 Feb 2016, at 18:14, Julien Anguenot <julien@anguenot.org =
<mailto:julien@anguenot.org>> wrote:
>>>>=20
>>>> At the time when the load is high and you have to restart, do you =
see any pending compactions when using `nodetool compactionstats`?
>>>>=20
>>>> Possible to see a `nodetool compactionstats` taken *when* the load =
is too high?  Have you checked the size of your SSTables for that big =
table? Any large ones in there?  What about the Java HEAP configuration =
on these nodes?
>>>>=20
>>>> If you have too many tombstones I would try to decrease =
gc_grace_seconds so they get cleared out earlier during compactions.
>>>>=20
>>>>  J.
>>>>=20
>>>>> On Feb 12, 2016, at 8:45 AM, Skvazh Roman <r@skvazh.com =
<mailto:r@skvazh.com>> wrote:
>>>>>=20
>>>>> There is 1-4 compactions at that moment.
>>>>> We have many tombstones, which does not removed.
>>>>> DroppableTombstoneRatio is 5-6 (greater than 1)
>>>>>=20
>>>>>> On 12 Feb 2016, at 15:53, Julien Anguenot <julien@anguenot.org =
<mailto:julien@anguenot.org>> wrote:
>>>>>>=20
>>>>>> Hey,=20
>>>>>>=20
>>>>>> What about compactions count when that is happening?
>>>>>>=20
>>>>>> J.
>>>>>>=20
>>>>>>=20
>>>>>>> On Feb 12, 2016, at 3:06 AM, Skvazh Roman <r@skvazh.com =
<mailto:r@skvazh.com>> wrote:
>>>>>>>=20
>>>>>>> Hello!
>>>>>>> We have a cluster of 25 c3.4xlarge nodes (16 cores, 32 GiB) with =
attached 1.5 TB 4000 PIOPS EBS drive.
>>>>>>> Sometimes one or two nodes user cpu spikes to 100%, load average =
to 20-30 - read requests drops of.
>>>>>>> Only restart of this cassandra services helps.
>>>>>>> Please advice.
>>>>>>>=20
>>>>>>> One big table with wide rows. 600 Gb per node.
>>>>>>> LZ4Compressor
>>>>>>> LeveledCompaction
>>>>>>>=20
>>>>>>> concurrent compactors: 4
>>>>>>> compactor throughput: tried from 16 to 128
>>>>>>> Concurrent_readers: from 16 to 32
>>>>>>> Concurrent_writers: 128
>>>>>>>=20
>>>>>>>=20
>>>>>>> https://gist.github.com/rskvazh/de916327779b98a437a6 =
<https://gist.github.com/rskvazh/de916327779b98a437a6>
>>>>>>>=20
>>>>>>>=20
>>>>>>> JvmTop 0.8.0 alpha - 06:51:10,  amd64, 16 cpus, Linux 3.14.44-3, =
load avg 19.35
>>>>>>> http://code.google.com/p/jvmtop =
<http://code.google.com/p/jvmtop>
>>>>>>>=20
>>>>>>> Profiling PID 9256: org.apache.cassandra.service.CassandraDa
>>>>>>>=20
>>>>>>> 95.73% (     4.31s) =
....google.common.collect.AbstractIterator.tryToComputeN()
>>>>>>> 1.39% (     0.06s) com.google.common.base.Objects.hashCode()
>>>>>>> 1.26% (     0.06s) io.netty.channel.epoll.Native.epollWait()
>>>>>>> 0.85% (     0.04s) =
net.jpountz.lz4.LZ4JNI.LZ4_compress_limitedOutput()
>>>>>>> 0.46% (     0.02s) net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast()
>>>>>>> 0.26% (     0.01s) =
com.google.common.collect.Iterators$7.computeNext()
>>>>>>> 0.06% (     0.00s) io.netty.channel.epoll.Native.eventFdWrite()
>>>>>>>=20
>>>>>>>=20
>>>>>>> ttop:
>>>>>>>=20
>>>>>>> 2016-02-12T08:20:25.605+0000 Process summary
>>>>>>> process cpu=3D1565.15%
>>>>>>> application cpu=3D1314.48% (user=3D1354.48% sys=3D-40.00%)
>>>>>>> other: cpu=3D250.67%
>>>>>>> heap allocation rate 146mb/s
>>>>>>> [000405] user=3D76.25% sys=3D-0.54% alloc=3D     0b/s - =
SharedPool-Worker-9
>>>>>>> [000457] user=3D75.54% sys=3D-1.26% alloc=3D     0b/s - =
SharedPool-Worker-14
>>>>>>> [000451] user=3D73.52% sys=3D 0.29% alloc=3D     0b/s - =
SharedPool-Worker-16
>>>>>>> [000311] user=3D76.45% sys=3D-2.99% alloc=3D     0b/s - =
SharedPool-Worker-4
>>>>>>> [000389] user=3D70.69% sys=3D 2.62% alloc=3D     0b/s - =
SharedPool-Worker-6
>>>>>>> [000388] user=3D86.95% sys=3D-14.28% alloc=3D     0b/s - =
SharedPool-Worker-5
>>>>>>> [000404] user=3D70.69% sys=3D 0.10% alloc=3D     0b/s - =
SharedPool-Worker-8
>>>>>>> [000390] user=3D72.61% sys=3D-1.82% alloc=3D     0b/s - =
SharedPool-Worker-7
>>>>>>> [000255] user=3D87.86% sys=3D-17.87% alloc=3D     0b/s - =
SharedPool-Worker-1
>>>>>>> [000444] user=3D72.21% sys=3D-2.30% alloc=3D     0b/s - =
SharedPool-Worker-12
>>>>>>> [000310] user=3D71.50% sys=3D-2.31% alloc=3D     0b/s - =
SharedPool-Worker-3
>>>>>>> [000445] user=3D69.68% sys=3D-0.83% alloc=3D     0b/s - =
SharedPool-Worker-13
>>>>>>> [000406] user=3D72.61% sys=3D-4.40% alloc=3D     0b/s - =
SharedPool-Worker-10
>>>>>>> [000446] user=3D69.78% sys=3D-1.65% alloc=3D     0b/s - =
SharedPool-Worker-11
>>>>>>> [000452] user=3D66.86% sys=3D 0.22% alloc=3D     0b/s - =
SharedPool-Worker-15
>>>>>>> [000256] user=3D69.08% sys=3D-2.42% alloc=3D     0b/s - =
SharedPool-Worker-2
>>>>>>> [004496] user=3D29.99% sys=3D 0.59% alloc=3D   30mb/s - =
CompactionExecutor:15
>>>>>>> [004906] user=3D29.49% sys=3D 0.74% alloc=3D   39mb/s - =
CompactionExecutor:16
>>>>>>> [010143] user=3D28.58% sys=3D 0.25% alloc=3D   26mb/s - =
CompactionExecutor:17
>>>>>>> [000785] user=3D27.87% sys=3D 0.70% alloc=3D   38mb/s - =
CompactionExecutor:12
>>>>>>> [012723] user=3D 9.09% sys=3D 2.46% alloc=3D 2977kb/s - RMI TCP =
Connection(2673)-127.0.0.1
>>>>>>> [000555] user=3D 5.35% sys=3D-0.08% alloc=3D  474kb/s - =
SharedPool-Worker-24
>>>>>>> [000560] user=3D 3.94% sys=3D 0.07% alloc=3D  434kb/s - =
SharedPool-Worker-22
>>>>>>> [000557] user=3D 3.94% sys=3D-0.17% alloc=3D  339kb/s - =
SharedPool-Worker-25
>>>>>>> [000447] user=3D 2.73% sys=3D 0.60% alloc=3D  436kb/s - =
SharedPool-Worker-19
>>>>>>> [000563] user=3D 3.33% sys=3D-0.04% alloc=3D  460kb/s - =
SharedPool-Worker-20
>>>>>>> [000448] user=3D 2.73% sys=3D 0.27% alloc=3D  414kb/s - =
SharedPool-Worker-21
>>>>>>> [000554] user=3D 1.72% sys=3D 0.70% alloc=3D  232kb/s - =
SharedPool-Worker-26
>>>>>>> [000558] user=3D 1.41% sys=3D 0.39% alloc=3D  213kb/s - =
SharedPool-Worker-23
>>>>>>> [000450] user=3D 1.41% sys=3D-0.03% alloc=3D  158kb/s - =
SharedPool-Worker-17
>>>>>>=20
>>>>=20
>>>>=20
>>>>=20
>>>=20
>>=20
>> --
>> Julien Anguenot (@anguenot)
>> USA +1.832.408.0344 <tel:+1.832.408.0344> =20
>> FR +33.7.86.85.70.44


--Apple-Mail=_A3720B53-570D-47FE-80CB-4580CD73185B
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=utf-8

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Dutf-8"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" =
class=3D"">If you positive this is not compaction related I would:<div =
class=3D""><br class=3D""></div><div class=3D"">&nbsp; &nbsp;1. check =
disk IOPs and latency on the EBS volume. (dstat)</div><div =
class=3D"">&nbsp; &nbsp;2. turn GC log on in cassandra-env.sh &nbsp;and =
use jstat to see what is happening to your HEAP.</div><div class=3D""><br =
class=3D""></div><div class=3D"">I have been asking about compactions =
initially because if you having one (1) big table written by all nodes =
and fully replicated to all nodes in the cluster would definitely =
trigger constant compactions on every nodes depending on write =
throughput.</div><div class=3D""><div class=3D""><br class=3D""></div><div=
 class=3D"">&nbsp; &nbsp;J.&nbsp;</div><div class=3D""><br =
class=3D""><div><blockquote type=3D"cite" class=3D""><div class=3D"">On =
Feb 12, 2016, at 11:03 AM, Skvazh Roman &lt;<a =
href=3D"mailto:r@skvazh.com" class=3D"">r@skvazh.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><div class=3D""><meta =
http-equiv=3D"Content-Type" content=3D"text/html charset=3Dutf-8" =
class=3D""><div style=3D"word-wrap: break-word; -webkit-nbsp-mode: =
space; -webkit-line-break: after-white-space;" class=3D""><div =
class=3D""><blockquote type=3D"cite" class=3D""><div class=3D""><div =
class=3D"">Does the load decrease and the node answers requests =
=E2=80=9Cnormally=E2=80=9D when you do disable auto-compaction? You =
actually see pending compactions on nodes having high load =
correct?</div></div></blockquote></div><div class=3D""><div =
class=3D""><div class=3D"">Nope.</div><div class=3D""><br =
class=3D""></div><div class=3D""><blockquote type=3D"cite" class=3D""><div=
 class=3D"">All seems legit here. Using G1 =
GC?</div></blockquote>Yes</div><div class=3D""><br class=3D""></div><div =
class=3D"">Problems also occurred on nodes without pending =
compactions.</div><div class=3D""><br class=3D""></div></div></div><div =
class=3D""><br class=3D""></div><br class=3D""><div class=3D""><blockquote=
 type=3D"cite" class=3D""><div class=3D"">On 12 Feb 2016, at 18:44, =
Julien Anguenot &lt;<a href=3D"mailto:julien@anguenot.org" =
class=3D"">julien@anguenot.org</a>&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><div =
style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-stroke-width: 0px;" class=3D""><blockquote type=3D"cite" =
class=3D""><div class=3D""><br class=3D"Apple-interchange-newline">On =
Feb 12, 2016, at 9:24 AM, Skvazh Roman &lt;<a href=3D"mailto:r@skvazh.com"=
 class=3D"">r@skvazh.com</a>&gt; wrote:</div><br =
class=3D"Apple-interchange-newline"><div class=3D""><div class=3D"">I =
have disabled autocompaction and stop it on highload node.<br =
class=3D""></div></div></blockquote><div class=3D""><br =
class=3D""></div><div class=3D"">Does the load decrease and the node =
answers requests =E2=80=9Cnormally=E2=80=9D when you do disable =
auto-compaction? You actually see pending compactions on nodes having =
high load correct?</div><br class=3D""><blockquote type=3D"cite" =
class=3D""><div class=3D""><div class=3D"">Heap is 8Gb. gc_grace is =
86400</div></div></blockquote><blockquote type=3D"cite" class=3D""><div =
class=3D""><div class=3D"">All sstables is about 200-300 Mb.<br =
class=3D""></div></div></blockquote><div class=3D""><br =
class=3D""></div>All seems legit here. Using G1 GC?</div><div =
style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-stroke-width: 0px;" class=3D""><br class=3D""></div><div =
style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-stroke-width: 0px;" class=3D""><blockquote type=3D"cite" =
class=3D""><div class=3D""><div class=3D"">$ nodetool compactionstats<br =
class=3D"">pending tasks: 14<br class=3D""></div></div></blockquote><div =
class=3D""><br class=3D""></div><div class=3D"">Try to increase the =
compactors from 4 to 6-8 on a node, disable gossip and let it finish =
compacting and put it back in the ring by enabling gossip. See what =
happens.</div><div class=3D""><br class=3D""></div><div class=3D"">The =
tombstones count growing is because the auto-aucompactions are disabled =
on these nodes. Probably not your issue.</div><div class=3D""><br =
class=3D""></div><div class=3D"">&nbsp; &nbsp;J.</div><div class=3D""><br =
class=3D""></div><div class=3D""><br class=3D""></div><blockquote =
type=3D"cite" class=3D""><div class=3D""><div class=3D""><br class=3D"">$ =
dstat -lvnr 10<br class=3D"">---load-avg--- ---procs--- =
------memory-usage----- ---paging-- -dsk/total- ---system-- =
----total-cpu-usage---- -net/total- --io/total-<br class=3D"">1m =
&nbsp;&nbsp;5m &nbsp;15m |run blk new| used &nbsp;buff &nbsp;cach =
&nbsp;free| &nbsp;in &nbsp;&nbsp;out | read &nbsp;writ| int =
&nbsp;&nbsp;csw |usr sys idl wai hiq siq| recv &nbsp;send| read =
&nbsp;writ<br class=3D"">29.4 28.6 23.5|0.0 &nbsp;&nbsp;0 1.2|11.3G =
&nbsp;190M 17.6G &nbsp;407M| &nbsp;&nbsp;0 &nbsp;&nbsp;&nbsp;&nbsp;0 =
|7507k 7330k| &nbsp;13k &nbsp;&nbsp;40k| 11 &nbsp;&nbsp;1 &nbsp;88 =
&nbsp;&nbsp;0 &nbsp;&nbsp;0 &nbsp;&nbsp;0| &nbsp;&nbsp;0 =
&nbsp;&nbsp;&nbsp;&nbsp;0 |96.5 &nbsp;64.6<br class=3D"">29.3 28.6 23.5| =
29 &nbsp;&nbsp;0 0.9|11.3G &nbsp;190M 17.6G &nbsp;408M| &nbsp;&nbsp;0 =
&nbsp;&nbsp;&nbsp;&nbsp;0 | &nbsp;&nbsp;0 &nbsp;&nbsp;189k|9822 =
&nbsp;2319 | 99 &nbsp;&nbsp;0 &nbsp;&nbsp;0 &nbsp;&nbsp;0 &nbsp;&nbsp;0 =
&nbsp;&nbsp;0| 138k &nbsp;120k| &nbsp;&nbsp;0 &nbsp;4.30<br =
class=3D"">29.4 28.6 23.6| 30 &nbsp;&nbsp;0 2.0|11.3G &nbsp;190M 17.6G =
&nbsp;408M| &nbsp;&nbsp;0 &nbsp;&nbsp;&nbsp;&nbsp;0 | &nbsp;&nbsp;0 =
&nbsp;&nbsp;&nbsp;26k|8689 &nbsp;2189 |100 &nbsp;&nbsp;0 &nbsp;&nbsp;0 =
&nbsp;&nbsp;0 &nbsp;&nbsp;0 &nbsp;&nbsp;0| 139k &nbsp;120k| =
&nbsp;&nbsp;0 &nbsp;2.70<br class=3D"">29.4 28.7 23.6| 29 &nbsp;&nbsp;0 =
3.0|11.3G &nbsp;190M 17.6G &nbsp;408M| &nbsp;&nbsp;0 =
&nbsp;&nbsp;&nbsp;&nbsp;0 | &nbsp;&nbsp;0 &nbsp;&nbsp;&nbsp;20k|8722 =
&nbsp;1846 | 99 &nbsp;&nbsp;0 &nbsp;&nbsp;0 &nbsp;&nbsp;0 &nbsp;&nbsp;0 =
&nbsp;&nbsp;0| 136k &nbsp;120k| &nbsp;&nbsp;0 &nbsp;1.50 ^C<br =
class=3D""><br class=3D""><br class=3D"">JvmTop 0.8.0 alpha - 15:20:37, =
&nbsp;amd64, 16 cpus, Linux 3.14.44-3, load avg 28.09<br class=3D""><a =
href=3D"http://code.google.com/p/jvmtop" =
class=3D"">http://code.google.com/p/jvmtop</a><br class=3D""><br =
class=3D"">PID 32505: org.apache.cassandra.service.CassandraDaemon<br =
class=3D"">ARGS:<br class=3D"">VMARGS: -ea =
-javaagent:/usr/share/cassandra/lib/jamm-0.3.0.jar -XX:+CMSCl[...]<br =
class=3D"">VM: Oracle Corporation Java HotSpot(TM) 64-Bit Server VM =
1.8.0_65<br class=3D"">UP: &nbsp;8:31m &nbsp;#THR: 334 &nbsp;#THRPEAK: =
437 &nbsp;#THRCREATED: 4694 USER: cassandra<br class=3D"">GC-Time: =
&nbsp;0: 8m &nbsp;&nbsp;#GC-Runs: 6378 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;#TotalLoadedClasses: 5926<br class=3D"">CPU:=
 97.96% GC: &nbsp;0.00% HEAP:6049m /7540m NONHEAP: &nbsp;82m / =
&nbsp;n/a<br class=3D""><br class=3D"">&nbsp;TID &nbsp;&nbsp;NAME =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;STATE =
&nbsp;&nbsp;&nbsp;CPU &nbsp;TOTALCPU BLOCKEDBY<br =
class=3D"">&nbsp;&nbsp;&nbsp;447 SharedPool-Worker-45 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;RUNNABLE 60.47% &nbsp;&nbsp;&nbsp;&nbsp;1.03%<br =
class=3D"">&nbsp;&nbsp;&nbsp;343 SharedPool-Worker-2 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;RUNNABLE 56.46% =
&nbsp;&nbsp;&nbsp;&nbsp;3.07%<br class=3D"">&nbsp;&nbsp;&nbsp;349 =
SharedPool-Worker-8 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;RUNNABLE 56.43% =
&nbsp;&nbsp;&nbsp;&nbsp;1.61%<br class=3D"">&nbsp;&nbsp;&nbsp;456 =
SharedPool-Worker-25 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;RUNNABLE 55.25% &nbsp;&nbsp;&nbsp;&nbsp;1.06%<br =
class=3D"">&nbsp;&nbsp;&nbsp;483 SharedPool-Worker-40 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;RUNNABLE 53.06% &nbsp;&nbsp;&nbsp;&nbsp;1.04%<br =
class=3D"">&nbsp;&nbsp;&nbsp;475 SharedPool-Worker-53 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;RUNNABLE 52.31% &nbsp;&nbsp;&nbsp;&nbsp;1.03%<br =
class=3D"">&nbsp;&nbsp;&nbsp;464 SharedPool-Worker-20 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;RUNNABLE 52.00% &nbsp;&nbsp;&nbsp;&nbsp;1.11%<br =
class=3D"">&nbsp;&nbsp;&nbsp;577 SharedPool-Worker-71 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;RUNNABLE 51.73% &nbsp;&nbsp;&nbsp;&nbsp;1.02%<br =
class=3D"">&nbsp;&nbsp;&nbsp;404 SharedPool-Worker-10 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;RUNNABLE 51.10% &nbsp;&nbsp;&nbsp;&nbsp;1.29%<br =
class=3D"">&nbsp;&nbsp;&nbsp;486 SharedPool-Worker-34 =
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;RUNNABLE 51.06% &nbsp;&nbsp;&nbsp;&nbsp;1.03%<br =
class=3D"">Note: Only top 10 threads (according cpu load) are shown!<br =
class=3D""><br class=3D""><br class=3D""><blockquote type=3D"cite" =
class=3D"">On 12 Feb 2016, at 18:14, Julien Anguenot &lt;<a =
href=3D"mailto:julien@anguenot.org" class=3D"">julien@anguenot.org</a>&gt;=
 wrote:<br class=3D""><br class=3D"">At the time when the load is high =
and you have to restart, do you see any pending compactions when using =
`nodetool compactionstats`?<br class=3D""><br class=3D"">Possible to see =
a `nodetool compactionstats` taken *when* the load is too high? =
&nbsp;Have you checked the size of your SSTables for that big table? Any =
large ones in there? &nbsp;What about the Java HEAP configuration on =
these nodes?<br class=3D""><br class=3D"">If you have too many =
tombstones I would try to decrease gc_grace_seconds so they get cleared =
out earlier during compactions.<br class=3D""><br class=3D"">&nbsp;J.<br =
class=3D""><br class=3D""><blockquote type=3D"cite" class=3D"">On Feb =
12, 2016, at 8:45 AM, Skvazh Roman &lt;<a href=3D"mailto:r@skvazh.com" =
class=3D"">r@skvazh.com</a>&gt; wrote:<br class=3D""><br class=3D"">There =
is 1-4 compactions at that moment.<br class=3D"">We have many =
tombstones, which does not removed.<br class=3D"">DroppableTombstoneRatio =
is 5-6 (greater than 1)<br class=3D""><br class=3D""><blockquote =
type=3D"cite" class=3D"">On 12 Feb 2016, at 15:53, Julien Anguenot =
&lt;<a href=3D"mailto:julien@anguenot.org" =
class=3D"">julien@anguenot.org</a>&gt; wrote:<br class=3D""><br =
class=3D"">Hey,<span class=3D"Apple-converted-space">&nbsp;</span><br =
class=3D""><br class=3D"">What about compactions count when that is =
happening?<br class=3D""><br class=3D"">J.<br class=3D""><br =
class=3D""><br class=3D""><blockquote type=3D"cite" class=3D"">On Feb =
12, 2016, at 3:06 AM, Skvazh Roman &lt;<a href=3D"mailto:r@skvazh.com" =
class=3D"">r@skvazh.com</a>&gt; wrote:<br class=3D""><br =
class=3D"">Hello!<br class=3D"">We have a cluster of 25 c3.4xlarge nodes =
(16 cores, 32 GiB) with attached 1.5 TB 4000 PIOPS EBS drive.<br =
class=3D"">Sometimes one or two nodes user cpu spikes to 100%, load =
average to 20-30 - read requests drops of.<br class=3D"">Only restart of =
this cassandra services helps.<br class=3D"">Please advice.<br =
class=3D""><br class=3D"">One big table with wide rows. 600 Gb per =
node.<br class=3D"">LZ4Compressor<br class=3D"">LeveledCompaction<br =
class=3D""><br class=3D"">concurrent compactors: 4<br class=3D"">compactor=
 throughput: tried from 16 to 128<br class=3D"">Concurrent_readers: from =
16 to 32<br class=3D"">Concurrent_writers: 128<br class=3D""><br =
class=3D""><br class=3D""><a =
href=3D"https://gist.github.com/rskvazh/de916327779b98a437a6" =
class=3D"">https://gist.github.com/rskvazh/de916327779b98a437a6</a><br =
class=3D""><br class=3D""><br class=3D"">JvmTop 0.8.0 alpha - 06:51:10, =
&nbsp;amd64, 16 cpus, Linux 3.14.44-3, load avg 19.35<br class=3D""><a =
href=3D"http://code.google.com/p/jvmtop" =
class=3D"">http://code.google.com/p/jvmtop</a><br class=3D""><br =
class=3D"">Profiling PID 9256: =
org.apache.cassandra.service.CassandraDa<br class=3D""><br =
class=3D"">95.73% ( &nbsp;&nbsp;&nbsp;&nbsp;4.31s) =
....google.common.collect.AbstractIterator.tryToComputeN()<br =
class=3D"">1.39% ( &nbsp;&nbsp;&nbsp;&nbsp;0.06s) =
com.google.common.base.Objects.hashCode()<br class=3D"">1.26% ( =
&nbsp;&nbsp;&nbsp;&nbsp;0.06s) =
io.netty.channel.epoll.Native.epollWait()<br class=3D"">0.85% ( =
&nbsp;&nbsp;&nbsp;&nbsp;0.04s) =
net.jpountz.lz4.LZ4JNI.LZ4_compress_limitedOutput()<br class=3D"">0.46% =
( &nbsp;&nbsp;&nbsp;&nbsp;0.02s) =
net.jpountz.lz4.LZ4JNI.LZ4_decompress_fast()<br class=3D"">0.26% ( =
&nbsp;&nbsp;&nbsp;&nbsp;0.01s) =
com.google.common.collect.Iterators$7.computeNext()<br class=3D"">0.06% =
( &nbsp;&nbsp;&nbsp;&nbsp;0.00s) =
io.netty.channel.epoll.Native.eventFdWrite()<br class=3D""><br =
class=3D""><br class=3D"">ttop:<br class=3D""><br =
class=3D"">2016-02-12T08:20:25.605+0000 Process summary<br =
class=3D"">process cpu=3D1565.15%<br class=3D"">application cpu=3D1314.48%=
 (user=3D1354.48% sys=3D-40.00%)<br class=3D"">other: cpu=3D250.67%<br =
class=3D"">heap allocation rate 146mb/s<br class=3D"">[000405] =
user=3D76.25% sys=3D-0.54% alloc=3D &nbsp;&nbsp;&nbsp;&nbsp;0b/s - =
SharedPool-Worker-9<br class=3D"">[000457] user=3D75.54% sys=3D-1.26% =
alloc=3D &nbsp;&nbsp;&nbsp;&nbsp;0b/s - SharedPool-Worker-14<br =
class=3D"">[000451] user=3D73.52% sys=3D 0.29% alloc=3D =
&nbsp;&nbsp;&nbsp;&nbsp;0b/s - SharedPool-Worker-16<br class=3D"">[000311]=
 user=3D76.45% sys=3D-2.99% alloc=3D &nbsp;&nbsp;&nbsp;&nbsp;0b/s - =
SharedPool-Worker-4<br class=3D"">[000389] user=3D70.69% sys=3D 2.62% =
alloc=3D &nbsp;&nbsp;&nbsp;&nbsp;0b/s - SharedPool-Worker-6<br =
class=3D"">[000388] user=3D86.95% sys=3D-14.28% alloc=3D =
&nbsp;&nbsp;&nbsp;&nbsp;0b/s - SharedPool-Worker-5<br class=3D"">[000404] =
user=3D70.69% sys=3D 0.10% alloc=3D &nbsp;&nbsp;&nbsp;&nbsp;0b/s - =
SharedPool-Worker-8<br class=3D"">[000390] user=3D72.61% sys=3D-1.82% =
alloc=3D &nbsp;&nbsp;&nbsp;&nbsp;0b/s - SharedPool-Worker-7<br =
class=3D"">[000255] user=3D87.86% sys=3D-17.87% alloc=3D =
&nbsp;&nbsp;&nbsp;&nbsp;0b/s - SharedPool-Worker-1<br class=3D"">[000444] =
user=3D72.21% sys=3D-2.30% alloc=3D &nbsp;&nbsp;&nbsp;&nbsp;0b/s - =
SharedPool-Worker-12<br class=3D"">[000310] user=3D71.50% sys=3D-2.31% =
alloc=3D &nbsp;&nbsp;&nbsp;&nbsp;0b/s - SharedPool-Worker-3<br =
class=3D"">[000445] user=3D69.68% sys=3D-0.83% alloc=3D =
&nbsp;&nbsp;&nbsp;&nbsp;0b/s - SharedPool-Worker-13<br class=3D"">[000406]=
 user=3D72.61% sys=3D-4.40% alloc=3D &nbsp;&nbsp;&nbsp;&nbsp;0b/s - =
SharedPool-Worker-10<br class=3D"">[000446] user=3D69.78% sys=3D-1.65% =
alloc=3D &nbsp;&nbsp;&nbsp;&nbsp;0b/s - SharedPool-Worker-11<br =
class=3D"">[000452] user=3D66.86% sys=3D 0.22% alloc=3D =
&nbsp;&nbsp;&nbsp;&nbsp;0b/s - SharedPool-Worker-15<br class=3D"">[000256]=
 user=3D69.08% sys=3D-2.42% alloc=3D &nbsp;&nbsp;&nbsp;&nbsp;0b/s - =
SharedPool-Worker-2<br class=3D"">[004496] user=3D29.99% sys=3D 0.59% =
alloc=3D &nbsp;&nbsp;30mb/s - CompactionExecutor:15<br class=3D"">[004906]=
 user=3D29.49% sys=3D 0.74% alloc=3D &nbsp;&nbsp;39mb/s - =
CompactionExecutor:16<br class=3D"">[010143] user=3D28.58% sys=3D 0.25% =
alloc=3D &nbsp;&nbsp;26mb/s - CompactionExecutor:17<br class=3D"">[000785]=
 user=3D27.87% sys=3D 0.70% alloc=3D &nbsp;&nbsp;38mb/s - =
CompactionExecutor:12<br class=3D"">[012723] user=3D 9.09% sys=3D 2.46% =
alloc=3D 2977kb/s - RMI TCP Connection(2673)-127.0.0.1<br =
class=3D"">[000555] user=3D 5.35% sys=3D-0.08% alloc=3D &nbsp;474kb/s - =
SharedPool-Worker-24<br class=3D"">[000560] user=3D 3.94% sys=3D 0.07% =
alloc=3D &nbsp;434kb/s - SharedPool-Worker-22<br class=3D"">[000557] =
user=3D 3.94% sys=3D-0.17% alloc=3D &nbsp;339kb/s - =
SharedPool-Worker-25<br class=3D"">[000447] user=3D 2.73% sys=3D 0.60% =
alloc=3D &nbsp;436kb/s - SharedPool-Worker-19<br class=3D"">[000563] =
user=3D 3.33% sys=3D-0.04% alloc=3D &nbsp;460kb/s - =
SharedPool-Worker-20<br class=3D"">[000448] user=3D 2.73% sys=3D 0.27% =
alloc=3D &nbsp;414kb/s - SharedPool-Worker-21<br class=3D"">[000554] =
user=3D 1.72% sys=3D 0.70% alloc=3D &nbsp;232kb/s - =
SharedPool-Worker-26<br class=3D"">[000558] user=3D 1.41% sys=3D 0.39% =
alloc=3D &nbsp;213kb/s - SharedPool-Worker-23<br class=3D"">[000450] =
user=3D 1.41% sys=3D-0.03% alloc=3D &nbsp;158kb/s - =
SharedPool-Worker-17<br class=3D""></blockquote><br =
class=3D""></blockquote></blockquote><br class=3D""><br class=3D""><br =
class=3D""></blockquote><br class=3D""></div></div></blockquote></div><br =
class=3D"" style=3D"font-family: Helvetica; font-size: 12px; font-style: =
normal; font-variant: normal; font-weight: normal; letter-spacing: =
normal; orphans: auto; text-align: start; text-indent: 0px; =
text-transform: none; white-space: normal; widows: auto; word-spacing: =
0px; -webkit-text-stroke-width: 0px;"><div class=3D"" =
style=3D"font-family: Helvetica; font-size: 12px; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
orphans: auto; text-align: start; text-indent: 0px; text-transform: =
none; white-space: normal; widows: auto; word-spacing: 0px; =
-webkit-text-stroke-width: 0px;"><div class=3D""><span class=3D"" =
style=3D"background-color: rgba(255, 255, 255, 0);">--</span></div><span =
class=3D"" style=3D"background-color: rgba(255, 255, 255, 0);">Julien =
Anguenot (@anguenot)<br class=3D""></span><div class=3D""><div =
class=3D""><span class=3D"" style=3D"background-color: rgba(255, 255, =
255, 0);"><span class=3D"" style=3D"font-weight: =
bold;">USA</span>&nbsp;<a href=3D"tel:+1.832.408.0344" =
x-apple-data-detectors=3D"true" x-apple-data-detectors-type=3D"telephone" =
x-apple-data-detectors-result=3D"1" =
class=3D"">+1.832.408.0344</a>&nbsp;&nbsp;</span></div><div =
class=3D""><span class=3D"" style=3D"background-color: rgba(255, 255, =
255, 0);"><span class=3D"" style=3D"font-weight: =
bold;">FR&nbsp;</span>+33.7.86.85.70.44</span></div></div></div></div></bl=
ockquote></div></div></div></blockquote></div><div class=3D""><div =
class=3D""><br class=3D""></div>

</div>
<br class=3D""></div></div></body></html>=

--Apple-Mail=_A3720B53-570D-47FE-80CB-4580CD73185B--