flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ufuk Celebi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3065) Can't cancel failing jobs
Date Tue, 24 Nov 2015 09:33:11 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15024091#comment-15024091

Ufuk Celebi commented on FLINK-3065:

I think this is the issue discussed on the mailing list. Quoting [~StephanEwen]:

The problem here is that there is no such thing as proper thread killing in
Java (at least it makes everything unstable if you do). Threads need to
exit cooperatively.

The Kafka Function calls simply are uninterruptibly stuck and never return
(pretty bad bug in their Zookeeper Client). As far as I know one cannot
clean this up properly unless one kills the process.

We could try and work around this by running the Zookeeper commit in a
dedicated lightweight thread that shares no resources and thus does not
make the system unstable if stopped (against better advise ;-) )

> Can't cancel failing jobs
> -------------------------
>                 Key: FLINK-3065
>                 URL: https://issues.apache.org/jira/browse/FLINK-3065
>             Project: Flink
>          Issue Type: Bug
>          Components: Command-line client, Webfrontend
>    Affects Versions: 0.10.0, 1.0.0
>            Reporter: Gyula Fora
>            Priority: Blocker
> It is currently not possible to stop a failing streaming job (if it get's stuck while
failing for instance).
> There is no cancel button in the web interface, also it doesnt show on the list of running
jobs in the command line.
> This means jobs getting stuck while failing will take down the cluster eventually.

This message was sent by Atlassian JIRA

View raw message