ignite-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juan Maia <j...@juanmaia.com>
Subject OutOfMemoryError
Date Wed, 21 Feb 2018 12:34:55 GMT
Hi all,


I'm getting an OutOfMemoryError in production, we have 16GB ram for the JVM
and 40GB for DirectMemory (non-heap). Clearly looks like objects are not
being collected from heap space. OldGen keeps growing and growing. This
started to happen after we changed the V in <K, V> from IgniteCache<K, V>
from byte[] to CacheEntry, which is a custom object that wraps byte[] plus
some metadata.

*Detailed description*

After a new deployment we started to have this error in a few nodes on our
production cluster, I needed to rollback this deployment to the stable

   - Number of Servers:
      - 6 (we're soon growing this cluster to 14 machines)
   - Server config:
      - 128GB of RAM, 24 cores.
   - JVM options:
   - -Xms8G -Xmx8G  -XX:+UseG1GC -XX:MaxGCPauseMillis=100
   - Ignite version:
   - 1.7.0
   - Stack:
      - HTTP Rest Interface using Dropwizard + Ignite
   - Artifacts stored in memory:
      - From KBs to hundreds of MBs

We've never had this problem before and we've been running Ignite in
production for the past 1.5 years.

Besides some application logic, the only thing changed that was Ignite
related was our instance of the IgniteCache that went from:

private final IgniteCache<Long, byte[]> dataCache;


private final IgniteCache<Long, CacheEntry> dataCache;

so now instead of having the raw data stored directly on Ignite, we have a
wrapper object that contains the byte array plus some metadata about that

I've been trying to profile the application looking for some sort of
"memory leak" and so far the only thing that I've found is this:

*Test suite*

I'm running exact the same suite of tests for both versions of our
application, the STABLE one and the NEW one. In the NEW one is where we
changed the IgniteCache instance.
JVM options on local machine (for tests purpose):

   - -Xms2G -Xmx2G -XX:+UseG1GC -XX:MaxGCPauseMillis=100


This is what memory usage looks like after the test suite on the STABLE
version of our application:

[image: Inline image 2] [image: Inline image 1]

This is what memory usage looks like after the test suite on the NEW
version of our application:

[image: Inline image 4][image: Inline image 3]

On the STABLE version Old Gen goes back to ~250MB, same amount used at
startup, with empty cache.
On the STABLE version Old Gen goes back to ~680MB, which is ~450MB more
than the amount used before storing objects in the cache.

The path for all those byte[] objects is:

[image: Inline image 5]

*Research *
Before posting this I googled about "Ignite MemoryLeak",
"BinaryMemoryAllocatorChunk" and "GridCircularBuffer" to see if someone had
this or some similar problem before. What I've found was those posts/issues:

   - https://issues.apache.org/jira/browse/IGNITE-967


   1. From those tickets/issues it seems like Ignite rely heavily on
   ThreadLocal and it's the application (Dropwizard?) job to clean it up. Is
   this the path that I should go into?
   2. I couldn't find any application related (company's package) object
   retained in memory at all. I think it's safe to assume that it's not an
   application bug, right?
   3. Any help here is really appreciated. If you need anymore information
   from the profile tool, I can get it. I'm using YourKit to profile the
   application. Thanks a lot.

Juan Maia.

View raw message