flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chesnay Schepler (Jira)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-15843) Gracefully shutdown TaskManagers on Kubernetes
Date Fri, 09 Oct 2020 09:47:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-15843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17210769#comment-17210769

Chesnay Schepler commented on FLINK-15843:

[~felixzheng] ping

> Gracefully shutdown TaskManagers on Kubernetes
> ----------------------------------------------
>                 Key: FLINK-15843
>                 URL: https://issues.apache.org/jira/browse/FLINK-15843
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.10.0
>            Reporter: Canbin Zheng
>            Priority: Major
>             Fix For: 1.12.0
> The current solution of stopping a TaskManager instance when JobManager sends a deletion
request is by directly calling {{KubernetesClient.pods().withName().delete}}, thus that instance
would be violently killed with a _KILL_ signal and having no chance to clean up, which could
cause problems because we expect the process to gracefully terminate when it is no longer
> Refer to the guide of [Termination of Pods|#termination-of-pods], we know that on Kubernetes
a _TERM_ signal would be first sent to the main process in each container, and may be followed
up with a force _KILL_ signal if the graceful shut-down period has expired; the Unix signal
will be sent to the process which has PID 1 ([Docker Kill|https://docs.docker.com/engine/reference/commandline/kill/]),
however, the TaskManagerRunner process is spawned by {color:#172b4d}/opt/flink/bin/kubernetes-entry.sh
{color}and could never have PID 1, so it would never receive the TERM signal.

This message was sent by Atlassian Jira

View raw message