cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Frank Ng <fnt...@gmail.com>
Subject Re: Intermittent long application pauses on nodes
Date Wed, 29 Jan 2014 19:02:14 GMT
Thanks for the update.  Our logs indicated that there were 0 pending for
CompactionManager at that time.  Also, there were no nodetool repairs
running at that time.  The log statements above state that the application
had to stop to reach a safepoint.  Yet, it doesn't say what is requesting
the safepoint.


On Wed, Jan 29, 2014 at 1:20 PM, Shao-Chuan Wang <
shaochuan.wang@bloomreach.com> wrote:

> We had similar latency spikes when pending compactions can't keep it up or
> repair/streaming taking too much cycles.
>
>
> On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng <fntemk@gmail.com> wrote:
>
>> All,
>>
>> We've been having intermittent long application pauses (version 1.2.8)
>> and not sure if it's a cassandra bug.  During these pauses, there are
>> dropped messages in the cassandra log file along with the node seeing other
>> nodes as down.  We've turned on gc logging and the following is an example
>> of a long "stopped" or pause event in the gc.log file.
>>
>> 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which
>> application threads were stopped: 0.091450 seconds
>> 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which
>> application threads were stopped: 51.8190260 seconds
>> 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which
>> application threads were stopped: 0.005470 seconds
>>
>> As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs
>> pause.  There were no GC log events between those 2 log statements.  Since
>> there's no GC logs in between, something else must be causing the long stop
>> time to reach a safepoint.
>>
>> Could there be a Cassandra thread that is taking a long time to reach a
>> safepoint and what is it trying to do? Along with the node seeing other
>> nodes as down in the cassandra log file, the StatusLogger shows 1599
>> Pending in ReadStage and 9 Pending in MutationStage.
>>
>> There is mention of cassandra batch revoke bias locks as a possible cause
>> (not GC) via:
>> http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html
>>
>> We have JNA, no swap, and the cluster runs fine besides there
>> intermittent long pause that can cause a node to appear down to other
>> nodes.  Any ideas as the cause of the long pause above? It seems not
>> related to GC.
>>
>> thanks.
>>
>>
>

Mime
View raw message