flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5759) Set an UncaughtExceptionHandler for all Thread Pools in JobManager
Date Fri, 10 Feb 2017 10:00:48 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861023#comment-15861023
] 

ASF GitHub Bot commented on FLINK-5759:
---------------------------------------

Github user StefanRRichter commented on a diff in the pull request:

    https://github.com/apache/flink/pull/3290#discussion_r100501680
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/filecache/FileCache.java
---
    @@ -99,7 +99,8 @@ public FileCache(String[] tempDirectories) throws IOException {
     		this.shutdownHook = createShutdownHook(this, LOG);
     
     		this.entries = new HashMap<JobID, Map<String, Tuple4<Integer, File, Path,
Future<Path>>>>();
    -		this.executorService = Executors.newScheduledThreadPool(10, ExecutorThreadFactory.INSTANCE);
    +		this.executorService = Executors.newScheduledThreadPool(10, 
    --- End diff --
    
    Is there any rational behind the magic number 10 or do we use this because it was 10 before?


> Set an UncaughtExceptionHandler for all Thread Pools in JobManager
> ------------------------------------------------------------------
>
>                 Key: FLINK-5759
>                 URL: https://issues.apache.org/jira/browse/FLINK-5759
>             Project: Flink
>          Issue Type: Bug
>          Components: JobManager
>    Affects Versions: 1.2.0
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
>             Fix For: 1.3.0
>
>
> Currently, the thread pools of the {{JobManager}} do not have any {{UncaughtExceptionHandler}}.
> While uncaught exceptions are rare (Flink handles exceptions aggressively in most places),
when exceptions slip through in these threads (which execute future responses and delayed
actions), the JobManager may be in an inconsistent state and not function properly any more.
> We should add a handler that results in a process kill in the case of uncaught exceptions.
Letting the JobManager be restarted by the respective cluster framework is the only guaranteed
way to be safe.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message