flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haibo Sun" <sunhaib...@163.com>
Subject Re:Job leak in attached mode (batch scenario)
Date Wed, 17 Jul 2019 02:19:32 GMT
Hi, Qi


As far as I know, there is no such mechanism now. To achieve this, I think it may be necessary
to add a REST-based heartbeat mechanism between Dispatcher and Client. At present, perhaps
you can add a monitoring service to deal with these residual Flink clusters.


Best,
Haibo

At 2019-07-16 14:42:37, "qi luo" <luoqi.bd@gmail.com> wrote:
Hi guys,


We runs thousands of Flink batch job everyday. The batch jobs are submitted in attached mode,
so we can know from the client when the job finished and then take further actions. To respond
to user abort actions, we submit the jobs with "—shutdownOnAttachedExit” so the Flink
cluster can be shutdown when the client exits.


However, in some cases when the Flink client exists abnormally (such as OOM), the shutdown
signal will not be sent to Flink cluster, causing the “job leak”. The lingering Flink
job will continue to run and never ends, consuming large amount of resources and even produce
unexpected results.


Does Flink has any mechanism to handle such scenario (e.g. Spark has cluster mode, where the
driver runs in the client side, so the job will exit when client exits)? Any idea will be
very appreciated!


Thanks,
Qi
Mime
View raw message