flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5718) Handle JVM Fatal Exceptions in Tasks
Date Mon, 06 Feb 2017 15:46:41 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15854227#comment-15854227

ASF GitHub Bot commented on FLINK-5718:

GitHub user StephanEwen opened a pull request:


    [FLINK-5718] [core] TaskManagers exit the JVM on fatal exceptions.

    *This adds a feature requested by a user for production stability.*
    Certain exceptions should not be attempted to be handled by the TaskManager, because they
indicate that the JVM is corrupt. When the task throws such an exception, the TaskManager
simply forcefully and immediately exits the JVM.
    Optionally, the `OutOfMemoryError` can also be set to cause such immediate JVM termination,
via the `taskmanager.jvm-exit-on-oom` config option.
    ### Tests
    This adds a test that tests the option and the actual process kill (via a spawned test
    ### Documentation
    This adds the `taskmanager.jvm-exit-on-oom` to the `setup/config.md` docs.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/StephanEwen/incubator-flink exit_on_fatal_error

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3276
commit 21c08817554e5a66186afa83158ca9c6ac975ba4
Author: Stephan Ewen <sewen@apache.org>
Date:   2017-02-06T14:52:39Z

    [FLINK-5718] [core] TaskManagers exit the JVM on fatal exceptions.


> Handle JVM Fatal Exceptions in Tasks
> ------------------------------------
>                 Key: FLINK-5718
>                 URL: https://issues.apache.org/jira/browse/FLINK-5718
>             Project: Flink
>          Issue Type: Improvement
>          Components: Local Runtime
>            Reporter: Stephan Ewen
>            Assignee: Stephan Ewen
> The TaskManager catches and handles all types of exceptions right now (all {{Throwables}}).
The intention behind that is:
>   - Many {{Error}} subclasses are recoverable for the TaskManagers, such as failure to
load/link user code
>   - We want to give eager notifications to the JobManager in case something in a task
goes wrong.
> However, there are some exceptions which should probably simply terminate the JVM, if
caught in the task thread, because they may leave the JVM in a dysfunctional limbo state:
>   - {{OutOfMemoryError}}
>   - {{InternalError}}
>   - {{UnknownError}}
>   - {{ZipError}}
> These are basically the subclasses of {{VirtualMachineError}}, except for {{StackOverflowError}},
which is recoverable and usually recovered already by the time the exception has been thrown
and the stack unwound.

This message was sent by Atlassian JIRA

View raw message