cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benedict Elliott Smith <belliottsm...@datastax.com>
Subject Re: Weird GC
Date Fri, 31 Jan 2014 14:09:38 GMT
You should expect to see lines of output like:

         vmop                    [threads: total initially_running
wait_to_block]    [time: spin block sync cleanup vmop] page_trap_count
0.436: Deoptimize                       [      10          0
   0    ]      [     0     0     0     0     0    ]  0
1.437: no vm operation                  [      18          0
   1    ]      [     0     0     0     0     0    ]  0
1.762: Deoptimize                       [      21          0
   0    ]      [     0     0     0     0     0    ]  0
2.764: no vm operation                  [     160          0
   1    ]      [     0     0     0     0     0    ]  0
2.876: Deoptimize                       [     161          0
   0    ]      [     0     0     0     0     0    ]  0
4.503: EnableBiasedLocking              [     164          0
   0    ]      [     0     0     0     0     0    ]  0
6.916: RevokeBias                       [     164          0
   0    ]      [     0     0     0     0     0    ]  0

You're looking for any of these lines printed at or around one of your
unexpectedly long pauses.




On 31 January 2014 10:40, Joel Samuelsson <samuelsson.joel@gmail.com> wrote:

> Thanks for your help.
>
> I've added those flags as well as some others I saw in another thread that
> redirects stdout to a file. What information is it that you need?
>
>
> 2014-01-29 Benedict Elliott Smith <belliottsmith@datastax.com>:
>
> It's possible the time attributed to GC is actually spent somewhere else;
>> a multitude of tasks may occur during the same safepoint as a GC. We've
>> seen some batch revoke of biased locks take a long time, for instance;
>> *if* this is happening in your case, and we can track down which
>> objects, I would consider it a bug and we may be able to fix it.
>>
>> -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1
>>
>>
>> On 29 January 2014 16:23, Joel Samuelsson <samuelsson.joel@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> We've been trying to figure out why we have so long and frequent
>>> stop-the-world GC even though we have basically no load.
>>>
>>> Today we got a log of a weird GC that I wonder if you have any theories
>>> of why it might have happened.
>>>
>>> A plot of our heap at the time, paired with the GC time from the
>>> Cassandra log:
>>> http://imgur.com/vw5rOzj
>>> -The blue line is the ratio of Eden space used (i.e. 1.0 = full)
>>> -The red line is the ratio of Survivor0 space used
>>> -The green line is the ratio of Survivor1 space used
>>> -The teal line is the ratio of Old Gen space used
>>> -The pink line shows during which period of time a GC happened (from the
>>> Cassandra log)
>>>
>>> Eden space is filling up and being cleared as expected in the first and
>>> last hill but on the middle one, it takes two seconds to clear Eden (note
>>> that Eden has ratio 1 for 2 seconds). Neither the survivor spaces nor old
>>> generation increase significantly afterwards.
>>>
>>> Any ideas of why this might be happening?
>>> We have swap disabled, JNA enabled, no CPU spikes at the time, no disk
>>> I/O spikes at the time. What else could be causing this?
>>>
>>> /Joel Samuelsson
>>>
>>
>>
>

Mime
View raw message