flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tillrohrmann <...@git.apache.org>
Subject [GitHub] flink pull request: Add [FLINK-1376] Add proper shared slot releas...
Date Mon, 12 Jan 2015 17:37:56 GMT
GitHub user tillrohrmann opened a pull request:


    Add [FLINK-1376] Add proper shared slot release in case of a fatal TaskManager failure

    This PR introduces SharedSlots as being a special Slot type and as such being released
properly in case an Instance has been marked dead. This fixes the problem that a dead instance,
which has not been shutdown properly, causes a job not being removed properly from the system,
because it is not aware of the SubSlots.
    Adds test cases where only the heartbeat is disabled to see if the job is properly failed.
    @StephanEwen: Would be great if you could take a close look at the code because of the
delicate synchronization mechanism. What I've done in the end is to synchronize most of the
calls by passing them through the SlotSharingGroupAssignment.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink fixSharedSlotRelease

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #300
commit 02004f98d1d76dc0683392690be38ab721bd6edd
Author: Till Rohrmann <trohrmann@apache.org>
Date:   2015-01-12T09:58:45Z

    [FLINK-1376] [runtime] Add proper shared slot release in case of a fatal TaskManager failure.


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message