cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shravan C <chall...@outlook.com>
Subject Re: OOM on Apache Cassandra on 30 Plus node at the same time
Date Sat, 04 Mar 2017 05:11:18 GMT
We run C* at 32 GB and all servers have 96GB RAM. We use STCS . LCS is not an option for us
as we have frequent updates.


Thanks,
Shravan
________________________________
From: Thakrar, Jayesh <jthakrar@conversantmedia.com>
Sent: Friday, March 3, 2017 3:47:27 PM
To: Joaquin Casares; user@cassandra.apache.org
Subject: Re: OOM on Apache Cassandra on 30 Plus node at the same time


Had been fighting a similar battle, but am now over the hump for most part.



Get info on the server config (e.g. memory, cpu, free memory (free -g), etc)

Run "nodetool info" on the nodes to get heap and off-heap sizes

Run "nodetool tablestats" or "nodetool tablestats <kespace>.<tablename>" on the
key large tables

Essentially the purpose is to see if you really had a true OOM or was your machine running
out of memory.



Cassandra can use offheap memory very well - so "nodetool info" will give you both heap and
offheap.



Also, what is the compaction strategy of your tables?



Personally, I have found STCS to be awful at large scale - when you have sstables that are
100+ GB in size.

See https://issues.apache.org/jira/browse/CASSANDRA-10821?focusedCommentId=15389451&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15389451



LCS seems better and should be the default (my opinion) unless you want DTCS



A good description of all three compactions is here - http://docs.scylladb.com/kb/compaction/

Documentation<http://docs.scylladb.com/kb/compaction/>
docs.scylladb.com
Scylla is a Cassandra-compatible NoSQL data store that can handle 1 million transactions per
second on a single server.








From: Joaquin Casares <joaquin@thelastpickle.com>
Date: Friday, March 3, 2017 at 11:34 AM
To: <user@cassandra.apache.org>
Subject: Re: OOM on Apache Cassandra on 30 Plus node at the same time



Hello Shravan,



Typically asynchronous requests are recommended over batch statements since batch statements
will cause more work on the coordinator node while individual requests, when using a TokenAwarePolicy,
will hit a specific coordinator, perform a local disk seek, and return the requested information.



The only times that using batch statements are ideal is if writing to the same partition key,
even if it's across multiple tables when using the same hashing algorithm (like murmur3).



Could you provide a bit of insight into what the batch statement was trying to accomplish
and how many child statements were bundled up within that batch?



Cheers,



Joaquin


Joaquin Casares

Consultant

Austin, TX



Apache Cassandra Consulting

http://www.thelastpickle.com

The Last Pickle • Apache Cassandra Consulting & Services<http://www.thelastpickle.com/>
www.thelastpickle.com
Apache Cassandra Consulting & Services. Our wealth of experience with Apache Cassandra
will ensure success at all stages of a your project lifecycle.




On Fri, Mar 3, 2017 at 11:18 AM, Shravan Ch <challa17@outlook.com<mailto:challa17@outlook.com>>
wrote:

Hello,

More than 30 plus Cassandra servers in the primary DC went down OOM exception below. What
puzzles me is the scale at which it happened (at the same minute). I will share some more
details below.

System Log: http://pastebin.com/iPeYrWVR

GC Log: http://pastebin.com/CzNNGs0r

During the OOM I saw lot of WARNings like the below (these were there for quite sometime may
be weeks)
WARN  [SharedPool-Worker-81] 2017-03-01 19:55:41,209 BatchStatement.java:252 - Batch of prepared
statements for [keyspace.table] is of size 225455, exceeding specified threshold of 65536
by 159919.

Environment:
We are using ApacheCassandra-2.1.9 on Multi DC cluster. Primary DC (more C* nodes on SSD and
apps run here)  and secondary DC (geographically remote and more like a DR to primary) on
SAS drives.
Cassandra config:

Java 1.8.0_65
Garbage Collector: G1GC
memtable_allocation_type: offheap_objects

Post this OOM I am seeing huge hints pile up on majority of the nodes and the pending hints
keep going up. I have increased HintedHandoff CoreThreads to 6 but that did not help (I admit
that I tried this on one node to try).

nodetool compactionstats -H
pending tasks: 3
compaction type            keyspace                          table   completed      total
   unit   progress
        Compaction              system                          hints     28.5 GB   92.38
GB   bytes     30.85%


Appreciate your inputs here.

Thanks,

Shravan



Mime
View raw message