flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4545) Flink automatically manages TM network buffer
Date Tue, 14 Mar 2017 04:30:41 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15923554#comment-15923554

ASF GitHub Bot commented on FLINK-4545:

Github user zhijiangW commented on the issue:

    @NicoK ´╝îthank you for explanation,  and I already trace the code in your local branch.
Wish your further change commit in global pool.
    @StephanEwen , thanks for further elaboration. From my understanding, each task can decide
the core number of buffers in `LocalBufferPool` based on input, output channels and configuration,
the maximum number of buffers based on `ResultPartitionType`. And all the `LocalBufferPool`s
make effect on the total number of buffers in `NetworkBufferPool`, may need consider maximum
memory usages.
    And my concern is to consider the memory usages in `NetworkBufferPool` before starts the
`TaskManager`, and this part of memory should be added into the total resource of `TaskManager`.

    I am willing to do that as a part of my current work in [Fine-grained Resource Configuration](https://issues.apache.org/jira/browse/FLINK-5131)
after this feature completes.

> Flink automatically manages TM network buffer
> ---------------------------------------------
>                 Key: FLINK-4545
>                 URL: https://issues.apache.org/jira/browse/FLINK-4545
>             Project: Flink
>          Issue Type: Wish
>          Components: Network
>            Reporter: Zhenzhong Xu
> Currently, the number of network buffer per task manager is preconfigured and the memory
is pre-allocated through taskmanager.network.numberOfBuffers config. In a Job DAG with shuffle
phase, this number can go up very high depends on the TM cluster size. The formula for calculating
the buffer count is documented here (https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#configuring-the-network-buffers).
> #slots-per-TM^2 * #TMs * 4
> In a standalone deployment, we may need to control the task manager cluster size dynamically
and then leverage the up-coming Flink feature to support scaling job parallelism/rescaling
at runtime. 
> If the buffer count config is static at runtime and cannot be changed without restarting
task manager process, this may add latency and complexity for scaling process. I am wondering
if there is already any discussion around whether the network buffer should be automatically
managed by Flink or at least expose some API to allow it to be reconfigured. Let me know if
there is any existing JIRA that I should follow.

This message was sent by Atlassian JIRA

View raw message