flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3443) JobManager cancel and clear everything fails jobs instead of cancelling
Date Sun, 21 Feb 2016 13:46:18 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156021#comment-15156021
] 

ASF GitHub Bot commented on FLINK-3443:
---------------------------------------

Github user uce commented on a diff in the pull request:

    https://github.com/apache/flink/pull/1669#discussion_r53566669
  
    --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
---
    @@ -1487,7 +1487,7 @@ class JobManager(
               }
             }
     
    -        eg.fail(cause)
    +        eg.cancel()
    --- End diff --
    
    Yes, that would work during shutdown, but there will be a chance that a `fail` right before
`cancelAndClearEverything` will still result in the restarting behaviour, because multiple
calls to `fail` are ignored when the job status is `FAILING`. `cancel` makes sure that this
does not happen, because cancellation "overwrites" failing behaviour.
    
    If we say that this is OK as a corner case, we can keep the `fail` on `cancelAndClearEverything`
and wrap the Exception to suppress restarts in the common case.


> JobManager cancel and clear everything fails jobs instead of cancelling
> -----------------------------------------------------------------------
>
>                 Key: FLINK-3443
>                 URL: https://issues.apache.org/jira/browse/FLINK-3443
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Runtime
>            Reporter: Ufuk Celebi
>            Assignee: Ufuk Celebi
>
> When the job manager is shut down, it calls {{cancelAndClearEverything}}. This method
does not {{cancel}} the {{ExecutionGraph}} instances, but {{fail}}s them, which can lead to
{{ExecutionGraph}} restart.
> I've noticed this in tests, where old graph got into a loop of restarts.
> What I don't understand is why the futures etc. are not cancelled when the executor service
is shut down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message