geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (GEODE-6369) Cache-creation failure after a successful auto-reconnect causes subsequent NPE
Date Tue, 19 Feb 2019 22:57:00 GMT


ASF subversion and git services commented on GEODE-6369:

Commit fca3d14fd69c34ae54524dda4807a09d533b82f1 in geode's branch refs/heads/release/1.9.0
from Bruce Schuchardt
[;h=fca3d14 ]

GEODE-6369 Cache-creation failure after a successful auto-reconnect causes subsequent NPE

If an error occurs while rebuilding the cache on auto-reconnect & we can't
continue we should throw an exception to any thread waiting for the
reconnect to complete.

If we're unable to contact the cluster configuration service we do not
terminate auto-reconnect attempts.

New members are now only allowed to join after view preparation has
completed.  This will reduce the number of "surprise members" and also
ensures that any old member IDs have been removed from the view.

We now only attempt to use findCoordinatorFromView multiple times if the
view actually changes.  Instead we contact locators again to see if
there are new registrants.

fixing the above exposed other problems in auto-reconnect:

* messages were being thrown away by the location service quorum checker
during auto-reconnect.  some of these were "join" messages that needed
to be delivered to the new membership service

* registrants weren't being removed from the recovered membership view
in the locator.  This confused restarting nodes because the recovered
membership view has stale info in it that they don't want to use

* locator services restart were hanging due to profile interchange being
done under synchronization

> Cache-creation failure after a successful auto-reconnect causes subsequent NPE
> ------------------------------------------------------------------------------
>                 Key: GEODE-6369
>                 URL:
>             Project: Geode
>          Issue Type: Bug
>          Components: membership
>            Reporter: Bruce Schuchardt
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.9.0
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
> If your server auto-reconnects but there is a problem recreating the cache the JGroups
channel used for auto-reconnect is closed.  This causes an NPE when the server makes another
auto-reconnect attempt.
> The server should instead just log the problem and shut down since future attempts to
recreate the cache will probably run into the same issue.

This message was sent by Atlassian JIRA

View raw message