activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jbertram <...@git.apache.org>
Subject [GitHub] activemq-artemis pull request: ARTEMIS-256 orchestrate failback de...
Date Tue, 20 Oct 2015 18:21:50 GMT
GitHub user jbertram opened a pull request:

    https://github.com/apache/activemq-artemis/pull/204

    ARTEMIS-256 orchestrate failback deterministically

    The failback process needs to be deterministic rather than relying on various
    incarnations of Thread.sleep() at crucial points. Important aspects of this
    change include:
    
    1) Make the initial replication synchronization process block at the very
    last step and wait for a response from the replica to ensure the replica has
    as the necessary data. This is a critical piece of knowledge during the
    failback process because it allows the soon-to-become-backup server to know
    for sure when it can shut itself down and allow the soon-to-become-live
    server to take over. Also, introduce a new configuration element called
    "initial-replication-sync-timeout" to conrol how long this blocking will occur.
    
    2) Set the state of the server as 'LIVE' only after the server is fully
    started. This is necessary because once the soon-to-be-backup server shuts
    down it needs to know that the soon-to-be-live server has started fully before
    it restarts itself as the new backup. If the soon-to-be-backup server restarts
    before the soon-to-be-live is fully started then it won't actually become a
    backup server but instead will become a live server which will break the
    failback process.
    
    3) Wait to receive the announcement of a backup server before failing-back.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jbertram/activemq-artemis ARTEMIS-256

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/activemq-artemis/pull/204.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #204
    
----
commit 908776eff3e5410b851aaaa1f62f7db764187acc
Author: jbertram <jbertram@apache.org>
Date:   2015-10-14T17:07:17Z

    ARTEMIS-256 orchestrate failback deterministically
    
    The failback process needs to be deterministic rather than relying on various
    incarnations of Thread.sleep() at crucial points. Important aspects of this
    change include:
    
    1) Make the initial replication synchronization process block at the very
    last step and wait for a response from the replica to ensure the replica has
    as the necessary data. This is a critical piece of knowledge during the
    failback process because it allows the soon-to-become-backup server to know
    for sure when it can shut itself down and allow the soon-to-become-live
    server to take over. Also, introduce a new configuration element called
    "initial-replication-sync-timeout" to conrol how long this blocking will occur.
    
    2) Set the state of the server as 'LIVE' only after the server is fully
    started. This is necessary because once the soon-to-be-backup server shuts
    down it needs to know that the soon-to-be-live server has started fully before
    it restarts itself as the new backup. If the soon-to-be-backup server restarts
    before the soon-to-be-live is fully started then it won't actually become a
    backup server but instead will become a live server which will break the
    failback process.
    
    3) Wait to receive the announcement of a backup server before failing-back.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message