flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Ewen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4545) Flink automatically manages TM network buffer
Date Mon, 20 Mar 2017 10:48:41 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932481#comment-15932481

Stephan Ewen commented on FLINK-4545:

[~zjwang] The logic outlined above is tailored towards the case where you have a container
or JVM of a certain size and want to configure how much memory goes to what component.

In the case where you actually want to compute the size of the container (as in the fine-grained
resource configuration code), we probably need a configuration parameter for the network memory
to add to each container. What do you think?

> Flink automatically manages TM network buffer
> ---------------------------------------------
>                 Key: FLINK-4545
>                 URL: https://issues.apache.org/jira/browse/FLINK-4545
>             Project: Flink
>          Issue Type: Wish
>          Components: Network
>            Reporter: Zhenzhong Xu
> Currently, the number of network buffer per task manager is preconfigured and the memory
is pre-allocated through taskmanager.network.numberOfBuffers config. In a Job DAG with shuffle
phase, this number can go up very high depends on the TM cluster size. The formula for calculating
the buffer count is documented here (https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-the-network-buffers).
> #slots-per-TM^2 * #TMs * 4
> In a standalone deployment, we may need to control the task manager cluster size dynamically
and then leverage the up-coming Flink feature to support scaling job parallelism/rescaling
at runtime. 
> If the buffer count config is static at runtime and cannot be changed without restarting
task manager process, this may add latency and complexity for scaling process. I am wondering
if there is already any discussion around whether the network buffer should be automatically
managed by Flink or at least expose some API to allow it to be reconfigured. Let me know if
there is any existing JIRA that I should follow.

This message was sent by Atlassian JIRA

View raw message