cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Jirsa"<jji...@apache.org>
Subject Re: Cassandra Node keep going down
Date Mon, 17 Jul 2017 20:53:49 GMT


On 2017-07-14 11:23 (-0700), "Harika Vangapelli -T (hvangape - AKRAYA INC at Cisco)"
	<hvangape@cisco.com> wrote: 
> We are using Cassandra 3.x version..
> 

Which 3.x version? 3.11.0? 3.0.14? 3.7? Exact version is important. 

> Recently, our production database is going through some instability issues. One of our
node is keep going down from every 2 days up to a few of times a day. The node is down due
to JVM out of memory. According to my investigation, I suspect that this might be related
to the writing and/or running compaction of the large partitions for some of our large data
tables. Here's might be what had happened
> 1. The node went OOM due to unable to de-serialize or compacting some large partitions
under some condition due to memory constrains.
> 2. Once we re-started it, which was usually a few hours later, the other nodes in the
cluster were trying to perform the hinted handoff to the down node to patch the missing data.
From now on, the down node would have to handle handoff plus the normal data load, which made
it even busier.
> 3. The node was not able to complete the handoff and went down again.
> 4. This went again and again.
> 

Sounds like it's always the same node? You may want to try running 'nodetool scrub' on that
node and watching logs for errors that may indicate a corrupt file on disk, which would cause
the behavior you're seeing.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Mime
View raw message