cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aiman Parvaiz <ai...@flipagram.com>
Subject Re: Cassandra compaction appears to stall, node becomes partially unresponsive
Date Wed, 22 Jul 2015 22:22:08 GMT
Hi Bryan
How's GC behaving on these boxes?

On Wed, Jul 22, 2015 at 2:55 PM, Bryan Cheng <bryan@blockcypher.com> wrote:

> Hi there,
>
> Within our Cassandra cluster, we're observing, on occasion, one or two
> nodes at a time becoming partially unresponsive.
>
> We're running 2.1.7 across the entire cluster.
>
> nodetool still reports the node as being healthy, and it does respond to
> some local queries; however, the CPU is pegged at 100%. One common thread
> (heh) each time this happens is that there always seems to be one of more
> compaction threads running (via nodetool tpstats), and some appear to be
> stuck (active count doesn't change, pending count doesn't decrease). A
> request for compactionstats hangs with no response.
>
> Each time we've seen this, the only thing that appears to resolve the
> issue is a restart of the Cassandra process; the restart does not appear to
> be clean, and requires one or more attempts (or a -9 on occasion).
>
> There does not seem to be any pattern to what machines are affected; the
> nodes thus far have been different instances on different physical machines
> and on different racks.
>
> Has anyone seen this before? Alternatively, when this happens again, what
> data can we collect that would help with the debugging process (in addition
> to tpstats)?
>
> Thanks in advance,
>
> Bryan
>



-- 
*Aiman Parvaiz*
Lead Systems Architect
aiman@flipagram.com
cell: 213-300-6377
http://flipagram.com/apz

Mime
View raw message