cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincent Rischmann>
Subject Re: Out of memory and/or OOM kill on a cluster
Date Mon, 21 Nov 2016 13:57:00 GMT

We tried with 12Gb and 16Gb, the problem appeared eventually too.

In this particular cluster we have 143 tables across 2 keyspaces.


We have one table with a max partition of 2.68GB, one of 256 MB, a bunch
with the size varying between 10MB to 100MB ~. Then there's the rest
with the max lower than 10MB.

On the biggest, the 99% is around 60MB, 98% around 25MB, 95%
around 5.5MB.
On the one with max of 256MB, the 99% is around 4.6MB, 98% around 2MB.

Could the 1% here really have that much impact ? We do write a lot to
the biggest table and read quite often too, however I have no way to
know if that big partition is ever read.

On Mon, Nov 21, 2016, at 01:09 PM, Alexander Dejanovski wrote:

> Hi Vincent,


> one of the usual causes of OOMs is very large partitions.

> Could you check your nodetool cfstats output in search of large
> partitions ? If you find one (or more), run nodetool cfhistograms on
> those tables to get a view of the partition sizes distribution.

> Thanks


> On Mon, Nov 21, 2016 at 12:01 PM Vladimir Yudovin
> <> wrote:
>> __

>> Did you try any value in the range 8-20 (e.g. 60-70% of physical
>> memory).
>> Also how many tables do you have across all keyspaces? Each table can
>> consume minimum 1M of Java heap.

>> Best regards, Vladimir Yudovin, 

>> *Winguzone[1] - Hosted Cloud Cassandra Launch your cluster in
>> minutes.*


>> ---- On Mon, 21 Nov 2016 05:13:12 -0500*Vincent Rischmann
>> <>* wrote ----

>>> Hello,


>>> we have a 8 node Cassandra 2.1.15 cluster at work which is giving us
>>> a lot of trouble lately.

>>> The problem is simple: nodes regularly die because of an out of
>>> memory exception or the Linux OOM killer decides to kill the
>>> process.
>>> For a couple of weeks now we increased the heap to 20Gb hoping it
>>> would solve the out of memory errors, but in fact it didn't; instead
>>> of getting out of memory exception the OOM killer killed the JVM.

>>> We reduced the heap on some nodes to 8Gb to see if it would work
>>> better, but some nodes crashed again with out of memory exception.

>>> I suspect some of our tables are badly modelled, which would cause
>>> Cassandra to allocate a lot of data, however I don't how to prove
>>> that and/or find which table is bad, and which query is responsible.

>>> I tried looking at metrics in JMX, and tried profiling using mission
>>> control but it didn't really help; it's possible I missed it because
>>> I have no idea what to look for exactly.

>>> Anyone have some advice for troubleshooting this ?


>>> Thanks.

> -- 

> -----------------

> Alexander Dejanovski

> France

> @alexanderdeja


> Consultant

> Apache Cassandra Consulting




View raw message