cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcin Pietraszek <mpietras...@opera.com>
Subject Re: High CPU load
Date Tue, 28 Jul 2015 14:29:20 GMT
I'm still struggling with finding the root cause for such CPU
utilisation patterns.

http://i58.tinypic.com/24pifcy.jpg

After a 3 weeks after C* restart CPU utilisation is going through the
roof, such situation isn't happening shortly after the restart (which
is visible at the graph).

C* is running on the machines with 16 cores, HEAP_NEWSIZE was set to
1600MB. Looking at gc logs I found:

2015-07-27T03:27:41.891+0000: 3354764.688: [ParNew:
1362069K->51348K(1474560K), 0.0942700 secs]
3268635K->1957914K(8224768K)After GC:
2015-07-27T03:27:41.999+0000: 3354764.797: [ParNew:
55327K->48503K(1474560K), 0.0935170 secs]
1961894K->1955069K(8224768K)After GC:
2015-07-27T03:27:43.464+0000: 3354766.261: [ParNew:
1359224K->54214K(1474560K), 0.0922640 secs]
3265790K->1960780K(8224768K)After GC:
2015-07-27T03:27:43.570+0000: 3354766.368: [ParNew:
56165K->45672K(1474560K), 0.0948220 secs]
1962732K->1952239K(8224768K)After GC:
2015-07-27T03:27:45.016+0000: 3354767.814: [ParNew:
1356393K->54245K(1474560K), 0.0922290 secs]
3262959K->1960811K(8224768K)After GC:
2015-07-27T03:27:45.121+0000: 3354767.919: [ParNew:
57866K->48553K(1474560K), 0.0928670 secs]
1964433K->1955119K(8224768K)After GC:
2015-07-27T03:27:46.590+0000: 3354769.387: [ParNew:
1359420K->48560K(1474560K), 0.0913750 secs]
3265986K->1955126K(8224768K)After GC:

It's clear to me that ParNew needs to be executed when 1359420K out of
1474560K is taken. But why it's executed when only 57866K/1474560K -
~4% of young is occupied? SurvivorRatio is set to 8 in our case.

Also what's bothering me is the time of CMS' concurreent sweep time:
gc.log.9:2015-07-27T07:54:36.561+0000: 3370772.285:
[CMS-concurrent-sweep: 19.710/20.600 secs] [Times: user=0.00
sys=317.60, real=20.60 secs]
Isn't it quite high?

Do you guys have any other thoughts? Which part of gc logs requires
more attention?


On Tue, Jul 21, 2015 at 4:22 PM, Marcin Pietraszek
<mpietraszek@opera.com> wrote:
> Yup... it seems like it's gc fault
>
> gc logs
>
> 2015-07-21T14:19:54.336+0000: 2876133.270: Total time for which
> application threads were stopped: 0.0832030 seconds
> 2015-07-21T14:19:55.739+0000: 2876134.673: Total time for which
> application threads were stopped: 0.0806960 seconds
> 2015-07-21T14:19:57.149+0000: 2876136.083: Total time for which
> application threads were stopped: 0.0806890 seconds
> 2015-07-21T14:19:58.550+0000: 2876137.484: Total time for which
> application threads were stopped: 0.0821070 seconds
> 2015-07-21T14:19:59.941+0000: 2876138.875: Total time for which
> application threads were stopped: 0.0802640 seconds
> 2015-07-21T14:20:01.340+0000: 2876140.274: Total time for which
> application threads were stopped: 0.0835670 seconds
> 2015-07-21T14:20:02.744+0000: 2876141.678: Total time for which
> application threads were stopped: 0.0842440 seconds
> 2015-07-21T14:20:04.143+0000: 2876143.077: Total time for which
> application threads were stopped: 0.0841630 seconds
> 2015-07-21T14:20:05.541+0000: 2876144.475: Total time for which
> application threads were stopped: 0.0839850 seconds
>
> Heap after GC invocations=2273737 (full 101):
>  par new generation   total 1474560K, used 106131K
> [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
>   eden space 1310720K,   0% used [0x00000005fae00000,
> 0x00000005fae00000, 0x000000064ae00000)
>   from space 163840K,  64% used [0x000000064ae00000,
> 0x00000006515a4ee0, 0x0000000654e00000)
>   to   space 163840K,   0% used [0x0000000654e00000,
> 0x0000000654e00000, 0x000000065ee00000)
>  concurrent mark-sweep generation total 6750208K, used 1316691K
> [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 49336K, used 29520K
> [0x00000007fae00000, 0x00000007fde2e000, 0x0000000800000000)
> }
> 2015-07-21T14:12:05.683+0000: 2875664.617: Total time for which
> application threads were stopped: 0.0830280 seconds
> {Heap before GC invocations=2273737 (full 101):
>  par new generation   total 1474560K, used 1416851K
> [0x00000005fae00000, 0x000000065ee00000, 0x000000065ee00000)
>   eden space 1310720K, 100% used [0x00000005fae00000,
> 0x000000064ae00000, 0x000000064ae00000)
>   from space 163840K,  64% used [0x000000064ae00000,
> 0x00000006515a4ee0, 0x0000000654e00000)
>   to   space 163840K,   0% used [0x0000000654e00000,
> 0x0000000654e00000, 0x000000065ee00000)
>  concurrent mark-sweep generation total 6750208K, used 1316691K
> [0x000000065ee00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 49336K, used 29520K
> [0x00000007fae00000, 0x00000007fde2e000, 0x0000000800000000)
>
> It seems like eden heap space is being constantly occupied by
> something which is later removed by gc...
>
>
> On Mon, Jul 20, 2015 at 9:18 AM, Jason Wee <peichieh@gmail.com> wrote:
>> just a guess, gc?
>>
>> On Mon, Jul 20, 2015 at 3:15 PM, Marcin Pietraszek <mpietraszek@opera.com>
>> wrote:
>>>
>>> Hello!
>>>
>>> I've noticed a strange CPU utilisation patterns on machines in our
>>> cluster. After C* daemon restart it behaves in a normal way, after a
>>> few weeks since a restart CPU usage starts to raise. Currently on one
>>> of the nodes (screenshots attached) cpu load is ~4. Shortly before
>>> restart load raises to ~15 (our cassandra machines have 16 cpus).
>>>
>>> In that cluster we're using bulkloading from hadoop cluster with 1400
>>> reducers (200 parallel bulkloading tasks). After such session of heavy
>>> bulkloading number of pending compactions is quite high but it's able
>>> to clear them before next "bulkloading session". We're also tracking
>>> number of pending compactions and during most of the time it's 0.
>>>
>>> On our machines we do have a few gigs of free memory ~7GB (17GB used),
>>> also it seems like we aren't IO bound.
>>>
>>> Screenshots from our zabbix with CPU utilisation graphs:
>>>
>>> http://i60.tinypic.com/xas8q8.jpg
>>> http://i58.tinypic.com/24pifcy.jpg
>>>
>>> Do you guys know what could be causing such high load?
>>>
>>> --
>>> mp
>>
>>

Mime
View raw message