flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Richter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-7289) Memory allocation of RocksDB can be problematic in container environments
Date Tue, 01 Aug 2017 17:54:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109411#comment-16109411

Stefan Richter commented on FLINK-7289:


I understand that cache memory is not freed, and this sounds exactly like what I would expect.
I would also expect that the consumed cache memory will not go down as long as it is still
below available cache memory configured from the OS perspective. If this becomes a problem
under YARN, this sounds like a problem of the cluster setup to me. Take this opinion with
a grain of sand, I am not an expert on YARN or container setups. 

Then this should not be an end user problem, but also not a Flink problem. It sounds like
an administrator and configuration problem to me. For example, this caching scenario should
also apply to all other filesystem reads / writes and not only to RocksDB. Manually dropping
OS file caches should never been required from any application or user, and if so it seems
like this is fixing the symptoms of a different problem. We can try to figure out more about
root problem and then maybe come up with a better documentation for the setup and configuration.
But I would disagree about introducing cache cleaning to Flink because writing to {{/proc/sys/vm/drop_caches}}
is an operation that requires root privileges.

Can you provide us with some more information, e.g. a detailed breakdown of the os memory
consumption after the process ended and the logs about the killed containers?

> Memory allocation of RocksDB can be problematic in container environments
> -------------------------------------------------------------------------
>                 Key: FLINK-7289
>                 URL: https://issues.apache.org/jira/browse/FLINK-7289
>             Project: Flink
>          Issue Type: Improvement
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0
>            Reporter: Stefan Richter
> Flink's RocksDB based state backend allocates native memory. The amount of allocated
memory by RocksDB is not under the control of Flink or the JVM and can (theoretically) grow
without limits.
> In container environments, this can be problematic because the process can exceed the
memory budget of the container, and the process will get killed. Currently, there is no other
option than trusting RocksDB to be well behaved and to follow its memory configurations. However,
limiting RocksDB's memory usage is not as easy as setting a single limit parameter. The memory
limit is determined by an interplay of several configuration parameters, which is almost impossible
to get right for users. Even worse, multiple RocksDB instances can run inside the same process
and make reasoning about the configuration also dependent on the Flink job.
> Some information about the memory management in RocksDB can be found here:
> https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB
> https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide
> We should try to figure out ways to help users in one or more of the following ways:
> - Some way to autotune or calculate the RocksDB configuration.
> - Conservative default values.
> - Additional documentation.

This message was sent by Atlassian JIRA

View raw message