flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tillrohrmann <...@git.apache.org>
Subject [GitHub] flink pull request: [FLINK-2790] [yarn] [ha] Adds high availabilit...
Date Fri, 02 Oct 2015 12:39:08 GMT
GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/1213

    [FLINK-2790] [yarn] [ha] Adds high availability support for Yarn

    Adds high availability support for Yarn by exploiting Yarn's functionality to restart
a failed application master. Depending on the Hadoop version the behaviour is an increasing
superset of functionalities of the preceding version's behaviour
    
    ###2.3.0 <= version < 2.4.0
    
    * Set the number of application attempts to the configuration value `yarn.application-attempts`.
This means that the application can be restarted `yarn.application-attempts` time before yarn
fails the application. In case of an application master, all other task manager containers
will also be killed.
    
    ### 2.4.0 <= version < 2.6.0
    
    * Additionally, enables that containers will be kept across application attempts. This
avoids the killing of TaskManager containers in the case of an application master failure.
This has the advantage that the startup time is faster and that the user does not have to
wait for obtaining the container resources again.
    
    ### 2.6.0 <= version
    
    * Sets the attempts failure validity interval to the akka timeout value. The attempts
failure validity interval says that an application is only killed after the system has seen
the maximum number of application attempts during one interval. This avoids that a long lasting
job will deplete it's application attempts.
    
    This PR also refactors the different Yarn components to allow the start-up of testing
actors within Yarn. Furthermore, the `JobManager` start up logic is slightly extended to allow
code reuse in the `ApplicationMasterBase`.
    
    The HA functionality is tested via the `YARNHighAvailabilityITCase` where an application
master is multiple times killed. Each time it's checked that the single TaskManager successfully
reconnects to the newly started `YarnJobManager`. In case of version `2.3.0`, the `TaskManager`
is restarted.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink yarnHA

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1213.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1213
    
----
commit 1a18172ae69eb576638704f8e143a921aa8630d5
Author: Till Rohrmann <trohrmann@apache.org>
Date:   2015-09-01T14:35:48Z

    [FLINK-2790] [yarn] [ha] Adds high availability support for Yarn

commit 5359676556d16610303d4f36fcbe5320ef4e6643
Author: Till Rohrmann <trohrmann@apache.org>
Date:   2015-09-23T15:42:57Z

    Refactors JobManager's start actors method to be reusable

commit d6a47cd8ad265c5122d1a67c09773cbc5a491261
Author: Till Rohrmann <trohrmann@apache.org>
Date:   2015-09-24T12:55:18Z

    Yarn refactoring to introduce yarn testing functionality

commit f9578f136dd41cd9829d712f7c62a59c9ea8e194
Author: Till Rohrmann <trohrmann@apache.org>
Date:   2015-09-28T16:21:30Z

    Added support for testing yarn cluster. Extracted JobManager's and TaskManager's testing
messages into stackable traits.

commit dbfa16438ad9d7d61e8d1a582c8cd1de9352078e
Author: Till Rohrmann <trohrmann@apache.org>
Date:   2015-09-29T15:05:01Z

    Implemented YarnHighAvailabilityITCase using Akka messages for synchronization.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message