geode-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GEODE-77) Replace JGroups 2.2.9
Date Thu, 05 Nov 2015 19:08:27 GMT

    [ https://issues.apache.org/jira/browse/GEODE-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14992265#comment-14992265
] 

ASF subversion and git services commented on GEODE-77:
------------------------------------------------------

Commit f3034be681dca37aca59710df3d44794d22d0f4e in incubator-geode's branch refs/heads/feature/GEODE-77
from [~bschuchardt]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-geode.git;h=f3034be ]

GEODE-77: bug fixes

GMSHealthMonitorJUnitTest was incorrectly using Mockito's any() when it should
have used isA().  Fixing this exposed a lot of problems in the health monitor
that this checkin addresses.  I've also renamed a number of entities so that we
now have more uniform use of the term "heartbeat" instead of "check".

GMSHealthMonitor now has a positive heartbeat sender thread that determines
who might be watching it and sends unsolicited heartbeats to those members.

GMSHealthMonitor now sends the viewID of its membership ID in TCP/IP
health checks.  This enables the receiver to differentiate between the
received UUID/viewID and its own information when it is a reconnected
member (using auto-reconnect).  The response threads are now also moved
to a cached thread-pool to decrease the cost of these checks.  Responses
now have soLinger set on them (experimental) because I was seeing a lot
of checks fail with EOF even though the member wrote an OK status.

The health monitor now uses suspectMembersInView to avoid suspecting the
same member over and over again.  This means that it can't be used to
avoid duplicate final-checks.  I've also disabled the collection thread for
suspect events because it was adding unnecessary delay in initiating
final-checks on crashed members and I have yet to see it collect more than
one event.

SuspectMembersMessage processing now checks to see if the receiver is the
target of the suspicion and, if so, send a heartbeat to the sender.  This
seems to happen a lot when the membership coordinator is a locator because
the locator doesn't push operations out to other members very often.  The
positive heartbeat sender will also help with this.

This change-set also turns off the JGroups thread pools because they were found
to be causing our performance problem.  This exposed a bug in JGroups that they
are fixing, but for now there is a workaround in StatRecorder.  Along with the
removal of thread pools we now need to pass messages through
handleOrDeferMessage() in GMSMembershipManager since processMessage() can be
blocked during initialization, causing a new process to time out trying to join
the distributed system.

GMSJoinLeave was not setting the failure detection ports on a new view if it
abandoned a view that it could not prepare.

The Connection class had some incorrect checks for shutdown conditions when
the shared/ordered connection to another member is shut down.  This should
improve our detection time for crashed members.


> Replace JGroups 2.2.9
> ---------------------
>
>                 Key: GEODE-77
>                 URL: https://issues.apache.org/jira/browse/GEODE-77
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Bruce Schuchardt
>            Assignee: Bruce Schuchardt
>            Priority: Blocker
>             Fix For: 1.0.0-incubating
>
>         Attachments: GEODE-MembershipManagerFunctionalSpecification-130715-1604-29054.pdf
>
>
> The JGroups 2.2.9 sources that are currently included in Geode must be replaced in order
for Geode to leave incubation.  A wiki document has been created to investigate alternatives.
> https://cwiki.apache.org/confluence/display/GEODE/Replacing+JGroups+2.2.9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message