flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5759) Set an UncaughtExceptionHandler for all Thread Pools in JobManager
Date Fri, 10 Feb 2017 13:37:41 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861272#comment-15861272

ASF GitHub Bot commented on FLINK-5759:

Github user StephanEwen commented on a diff in the pull request:

    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/util/ExecutorThreadFactory.java
    @@ -18,49 +18,112 @@
     package org.apache.flink.runtime.util;
    +import java.lang.Thread.UncaughtExceptionHandler;
     import java.util.concurrent.ThreadFactory;
     import java.util.concurrent.atomic.AtomicInteger;
     import org.slf4j.Logger;
     import org.slf4j.LoggerFactory;
    +import static org.apache.flink.util.Preconditions.checkNotNull;
    + * A thread 
    --- End diff --
    True, incomplete, will fix that.

> Set an UncaughtExceptionHandler for all Thread Pools in JobManager
> ------------------------------------------------------------------
>                 Key: FLINK-5759
>                 URL: https://issues.apache.org/jira/browse/FLINK-5759
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.2.0
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>             Fix For: 1.3.0
> Currently, the thread pools of the {{JobManager}} do not have any {{UncaughtExceptionHandler}}.
> While uncaught exceptions are rare (Flink handles exceptions aggressively in most places),
when exceptions slip through in these threads (which execute future responses and delayed
actions), the JobManager may be in an inconsistent state and not function properly any more.
> We should add a handler that results in a process kill in the case of uncaught exceptions.
Letting the JobManager be restarted by the respective cluster framework is the only guaranteed
way to be safe.

This message was sent by Atlassian JIRA

View raw message