cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Evans <>
Subject Re: OOM on Apache Cassandra on 30 Plus node at the same time
Date Mon, 06 Mar 2017 14:31:01 GMT
On Fri, Mar 3, 2017 at 11:18 AM, Shravan Ch <> wrote:
> More than 30 plus Cassandra servers in the primary DC went down OOM
> exception below. What puzzles me is the scale at which it happened (at the
> same minute). I will share some more details below.

You'd be surprised; When it's the result of aberrant data/workload,
then having many nodes OOM at once is more common than you might

> System Log:

The traceback shows the OOM occurring during a read (a slice), not a
write.  What does your data model and queries look like?  Do you do
deletes (TTLs maybe)? Did the OOM result in a heap dump?

> GC Log:
> During the OOM I saw lot of WARNings like the below (these were there for
> quite sometime may be weeks)
> WARN  [SharedPool-Worker-81] 2017-03-01 19:55:41,209
> - Batch of prepared statements for [keyspace.table] is of size 225455,
> exceeding specified threshold of 65536 by 159919.
> Environment:
> We are using ApacheCassandra-2.1.9 on Multi DC cluster. Primary DC (more C*
> nodes on SSD and apps run here)  and secondary DC (geographically remote and
> more like a DR to primary) on SAS drives.
> Cassandra config:
> Java 1.8.0_65
> Garbage Collector: G1GC
> memtable_allocation_type: offheap_objects
> Post this OOM I am seeing huge hints pile up on majority of the nodes and
> the pending hints keep going up. I have increased HintedHandoff CoreThreads
> to 6 but that did not help (I admit that I tried this on one node to try).
> nodetool compactionstats -H
> pending tasks: 3
> compaction type            keyspace                          table
> completed      total    unit   progress
>         Compaction              system                          hints
> 28.5 GB   92.38 GB   bytes     30.85%

Eric Evans

View raw message