Hi Pedro,
since you are using RocksDB backend, RocksDB will consume some extra native memory, sometimes the amount of that could be very large, because the default setting of RocksDB will keep a `BloomFilter` for every opened sst in memory, and the number of the opened sst is not limited by default, so in theory, the native memory consumed by the `BloomFilter` will increase with the number of sst. Beside the memory consumed by `BloomFilter`, there are some other parts in RocksDB will also consumed some native memory, e.g the size of the WriteBuffer, the max number of the WriteBuffer and so on, but there not the mainly one in general. You can find more information about the memory allocation of RocksDB here: https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB, also there is already an issue that related to your question, maybe you can also find some useful information there. https://issues.apache.org/jira/browse/FLINK-7289.

Concerning to your separate questions.

1. Shouldn't the sum of the job manager memory and the task manager memory
account for all the memory allocated by Flink?  Am I missing any
configuration?

No, it only define the size of memory that controlled by the JVM. There would be some extra native memory consumed by RocksDB if you're using the RocksDBBackend.

2. How can I mantain the server working in this scenario?

Since you are using the RocksDB backend, the size of the native memory consumed by RocksDB is pretty hard to controlled, in the most safety case, you can turn off the filter cache(in general, this is the mainly memory consumer, but this will hurt your performance), and also reduce the size of the WriteBuffer and also the number of the max WriteBuffer.

3. I thought that RocksDB would do the job, but it didn't happen. 

The memory consumed by the RocksDB can not be precisely limited yet. You can change the options to control it coarse-grained.
 
4. In the past, I have seen Flink taking a checkpoint of 3GB, but
allocating initially 35GB of RAM. Where does this extra memory come from?

I think the extr memory is the native memory consumed by RocksDB, and the most of them are used for filter caching.

Since this type of email should go into `user mail list` in general, so I redict it there. and I think maybe stefan(cc) could tell more about your question, and plz correct me if I'm wrong.

Best,
Sihua



On 05/10/2018 03:12Pedro Elias<pedrolge@poli.ufrj.br> wrote:
Hi,

I have Flink running on 2 docker images, one for the job manager, and one
for the task manager, with the configuration below.

64GB RAM machine
200 GB SSD used only by RocksDB

Flink's memory configuration file is like that:

jobmanager.heap.mb: 3072
taskmanager.heap.mb: 53248
taskmanager.memory.fraction: 0.7

I have a very large and heavy job running in this server. The problem is
that the task manager is trying to take more memory than defined on the
configuration and eventually crashes the server, although the heap never
reaches the maximum memory. The last memory log before crashing shows:

Memory usage stats: [HEAP: 44432/53248/53248 MB, NON HEAP: 157/160/-1 MB
(used/committed/max)]

But the memory used by the task manager container is near 64GB


I have some doubts regarding memory usage of Flink.


1. Shouldn't the sum of the job manager memory and the task manager memory
account for all the memory allocated by Flink?  Am I missing any
configuration?

2. How can I mantain the server working in this scenario?

3. I thought that RocksDB would do the job, but it didn't happen.

4. In the past, I have seen Flink taking a checkpoint of 3GB, but
allocating initially 35GB of RAM. Where does this extra memory come from?


Can anyone help me, please?

Thanks in advance.

Pedro Luis