geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (GEODE-3024) race condition between server and restarted locator preparing membership views
Date Wed, 14 Jun 2017 18:06:00 GMT


ASF subversion and git services commented on GEODE-3024:

Commit 796a2e25acef78aa974d4432bf8f551b230f8c93 in geode's branch refs/heads/release/1.2.0
from [~bschuchardt]
[;h=796a2e2 ]

GEODE-3024 race condition between server locator preparing membership views

If a locator is preparing a conflicting membership view we now abandon
preparation of a view in a cache server and pause before retrying.
This gives the locator time to gather information from the cache server's
view (which it receives in acks while preparing its own view),
incorporate them into a new view and send it out.  When the cache
server installs the new view from the locator it will shut down its
ViewCreator thread.

(cherry picked from commit 31b72ba48b2dda95954b30c14ae62a8730065b3f)

> race condition between server and restarted locator preparing membership views
> ------------------------------------------------------------------------------
>                 Key: GEODE-3024
>                 URL:
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>            Reporter: Bruce Schuchardt
>             Fix For: 1.2.0
> When a locator is restarted & recovers from disk it will try to take over the role
of membership coordinator for the cluster if it finds the current coordinator is a cache server.
 If the cache server is in the process of sending out a new view it may get into a race with
the locator in sending out view preparation messages.
> The locator will send out a view-prep message and the server will also send one.  Responses
to the view-prep message will include the conflicting view and each of the two processes will
create a new view and send it out.  This repeats ad-infinitum.
> This problem was observed in a system that was shutting down at the same time a locator
was being restarted.

This message was sent by Atlassian JIRA

View raw message