cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuri de Wit <yde...@gmail.com>
Subject Re: Sporadic high IO bandwidth and Linux OOM killer
Date Fri, 28 Dec 2018 18:48:08 GMT
We


On Fri, Dec 28, 2018, 4:23 PM Oleksandr Shulgin <
oleksandr.shulgin@zalando.de> wrote:

> On Fri, Dec 7, 2018 at 12:43 PM Oleksandr Shulgin <
> oleksandr.shulgin@zalando.de> wrote:
>
>>
>> After a fresh JVM start the memory allocation looks roughly like this:
>>
>>              total       used       free     shared    buffers     cached
>> Mem:           14G        14G       173M       1.1M        12M       3.2G
>> -/+ buffers/cache:        11G       3.4G
>> Swap:           0B         0B         0B
>>
>> Then, within a number of days, the allocated disk cache shrinks all the
>> way down to unreasonable numbers like only 150M.  At the same time "free"
>> stays at the original level and "used" grows all the way up to 14G.
>> Shortly after that the node becomes unavailable because of the IO and
>> ultimately after some time the JVM gets killed.
>>
>> Most importantly, the resident size of JVM process stays at around 11-12G
>> all the time, like it was shortly after the start.  How can we find where
>> the rest of the memory gets allocated?  Is it just some sort of malloc
>> fragmentation?
>>
>
> For the ones following along at home, here's what we ended up with so far:
>
> 0. Switched to the next biggest EC2 instance type, r4.xlarge: and the
> symptoms are gone.  Our bill is dominated by the price EBS storage, so this
> is much less than 2x increase in total.
>
> 1. We've noticed that increased memory usage correlates with the number of
> SSTables on disk.  When the number of files on disk decreases, available
> memory increases.  This leads us to think that extra memory allocation is
> indeed due to use of mmap.  Not clear how we could account for that.
>
> 2. Improved our monitoring to include number of files (via total - free
> inodes).
>
> Given the cluster's resource utilization, it still feels like r4.large
> would be a good fit, if only we could figure out those few "missing" GB of
> RAM. ;-)
>
> Cheers!
> --
> Alex
>
>

Mime
View raw message