flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Metzger (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-2079) Add watcher to YARN TM containers to detect stopped actor system
Date Fri, 22 May 2015 09:49:17 GMT
Robert Metzger created FLINK-2079:
-------------------------------------

             Summary: Add watcher to YARN TM containers to detect stopped actor system
                 Key: FLINK-2079
                 URL: https://issues.apache.org/jira/browse/FLINK-2079
             Project: Flink
          Issue Type: Improvement
          Components: TaskManager, YARN Client
    Affects Versions: 0.9
            Reporter: Robert Metzger
            Assignee: Robert Metzger


I experienced an OutOfMemoryError (caused by the usercode) while running Flink on YARN.
It seems that the TaskManager is correctly detecting the fatal error, however the JVM is not
shutting down, so YARN won't bring up new containers.

Therefore, I want to start a thread on the YarnTaskManagerRunner which periodically (every
30 seconds) checks whether the actor system is still running. If not, its doing a System.exit(1).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message