flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijiang(wangzhijiang999)" <wangzhijiang...@aliyun.com>
Subject 回复:[jira] [Created] (FLINK-6057) Better default needed for num network buffers
Date Wed, 15 Mar 2017 09:21:53 GMT
Hi Luke,
       Yes, it is really not very friendly for users to set the number of network buffers.
And it may cause OOM exception if not set the proper amount for large scale job.
The proper amount should be calculated automatically by framework based on the number of input
and output channels for each task, and @Nico Kruber is working on 
this feature now. You can check this jira for some details "https://issues.apache.org/jira/browse/FLINK-4545".
------------------------------------------------------------------发件人:Luke Hutchison
(JIRA) <jira@apache.org>发送时间:2017年3月15日(星期三) 17:01收件人:dev
<dev@flink.apache.org>主 题:[jira] [Created] (FLINK-6057) Better default needed
for num network buffers
Luke Hutchison created FLINK-6057:

             Summary: Better default needed for num network buffers
                 Key: FLINK-6057
                 URL: https://issues.apache.org/jira/browse/FLINK-6057
             Project: Flink
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.2.0
            Reporter: Luke Hutchison

Using the default environment,

ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

my code will sometimes fail with an error that Flink ran out of network buffers. To fix this, I have to do:

int numTasks = Runtime.getRuntime().availableProcessors();
config.setInteger(ConfigConstants.DEFAULT_PARALLELISM_KEY, numTasks);
config.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, numTasks);
config.setInteger(ConfigConstants.TASK_MANAGER_NETWORK_NUM_BUFFERS_KEY, numTasks * 2048);

The default value of 2048 fails when I increase the degree of parallelism for a large Flink pipeline (hence the fix to set the number of buffers to numTasks * 2048).

This is particularly problematic because a pipeline can work fine on one machine, and when you start the pipeline on a machine with more cores, it can fail.

The default execution environment should pick a saner default based on the level of parallelism (or whatever is needed to ensure that the number of network buffers is not going to be exceeded for a given execution environment).

This message was sent by Atlassian JIRA
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message