flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <mj...@informatik.hu-berlin.de>
Subject [DISCUSS] Canceling Streaming Jobs
Date Tue, 26 May 2015 23:17:58 GMT
Hi,

currently, the only way to stop a streaming job is to "cancel" the job,
This has multiple disadvantage:
 1) a "clean" stopping is not possible (see
https://issues.apache.org/jira/browse/FLINK-1929 -- I think a clean stop
is a pre-requirement for FLINK-1929) and
 2) as a minor issue, all canceled jobs are listed as canceled in the
history (what is somewhat confusing for the user -- at least it was for
me when I started to work with Flink Streaming).

This issue was raised a few times already, however, no final conclusion
was there (if I remember correctly). I could not find a JIRA for it either.

From my understanding of the system, there would be two ways to
implement a nice way for stopping streaming jobs:

  1) "Task"s can be distinguished between "batch" and "streaming"
     -> canceling a batch jobs works as always
     -> canceling a streaming job only send a "canceling" signal to the
sources, and waits until the job finishes (ie, sources stop emitting
data and finish regularly, triggering the finishing of all operators).
For this case, streaming jobs are stopped in a "clean way" (as is the
input would have be finite) and the job will be listed as "finished" in
the history regularly.

  This approach has the advantage, that it should be simpler to
implement. However, the disadvantages are (1) a "hard canceling" of jobs
is not possible any more, and (2) Flink must be able to distinguishes
batch and streaming jobs (I don't think Flink runtime can distinguish
both right now?)

  2) A new message "terminate" (or similar) is introduced, that can only
be used for streaming jobs (would be ignored for batch jobs) that stops
the sources and waits until the job finishes regularly.

  This approach has the advantage, that current system behavior is
preserved (it only adds a few feature). The disadvantage is, that all
clients need to be touched and it must be clear to the user, that
"terminate" does not work for streaming jobs. If an error/warning should
be raised if a user tries to "terminate" a batch job, Flink must be able
to distinguish between batch and streaming jobs, too.  As an
alternative, "terminate" on batch jobs could be interpreted as "cancel",
too.


I personally think, that the second approach is better. Please give
feedback. If we can get to a conclusion how to implement it, I would
like to work on it.


-Matthias


Mime
View raw message