flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soumya Simanta <soumya.sima...@gmail.com>
Subject Flink 1.0 Critical memory issue/leak with a high throughput stream
Date Thu, 09 Jun 2016 08:29:27 GMT
We are using Flink in production and running into some high CPU and memory
issues. Our initial analysis points to high memory utilization that
exponentially increases the CPU required for GC and ultimately takes down
the task managers.

We are running Flink 1.0 on YARN (on Amazon EMR).

[image: Inline image 1]

We consuming a real-time stream from Kafka and creating some windows and
making some calls to Redis (using Rediscala - a non-blocking Redis client
lib based on Akka).


We took some heap dumps and looks like we have a large number of instances
of
akka.dispatch.AbstractQueueNode.
[image: Inline image 2]

And most of these are *unreachable*.

[image: Inline image 3]

It is not clear if this is
1) an Akka issue [1,2]
2) a Flink issue
3) a Rediscala client library issue
4) an issue with the way we are using Scala Futures inside Flink code.
5) Flink running on YARN issue

Has anyone else seen a similar issue in Flink? We are planning to test this
again with a custom build with a newer version of Akka (see [1])

[1]https://github.com/akka/akka/issues/19216
[2]https://groups.google.com/forum/#!topic/akka-user/D_qYP47Mc8Y


-Soumya

Mime
View raw message