flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhijiang(wangzhijiang999)" <wangzhijiang...@aliyun.com>
Subject 回复:Dataset and eager scheduling
Date Fri, 03 Mar 2017 07:41:05 GMT
    From my understand,  if you do not care resource waste and confirm there are enough
resources in cluster, you can set EAGER schedule mode for batch job.
    From optimizer aspect, if not set the PIPELINED_FORCED hint for ExecutionMode, for
some special topology cases, the optimizer would consider BATCH DataExchangeMode to avoid
dead lock risk. That means the producer tasks should first deploy and output the data. After
the producer tasks finish, the consumer tasks will be scheduled and start to consume data.And
it is exactly the case of FROM_SOURCE schedule mode. For this case, if use EAGER mode for
replacement, the consumer task may be do nothing after startup until the producer tasks finish,
so it wastes resources.  But for PIPELINED DataExchangeMode, EAGER schedule mode can make
sense because the consumer task can request data once the producer task ouput the first data.
    Maybe my understanding is not very accurate, welcome any discuss!

------------------------------------------------------------------发件人:CPC <achalil@gmail.com>发送时间:2017年3月2日(星期四)
18:52收件人:dev <dev@flink.apache.org>主 题:Dataset and eager scheduling
Hi all,

Currently our team trying implement a runtime operator also playing with
scheduler. We are trying to understand batch optimizer but it will take
some time. What we want to know is whether changing batch scheduling mode
from LAZY_FROM_SOURCES to EAGER could affect optimizer? I mean whether
optimizer have some strong assumptions that batch jobs scheduling mode is
always lazy_from_sources?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message