curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Running Fly (JIRA)" <>
Subject [jira] [Created] (CURATOR-320) Discovery reregiser triggered even if retry policy suceeds.
Date Fri, 29 Apr 2016 18:39:12 GMT
Running Fly created CURATOR-320:

             Summary: Discovery reregiser triggered even if retry policy suceeds.
                 Key: CURATOR-320
             Project: Apache Curator
          Issue Type: Bug
          Components: Client, Framework
    Affects Versions: 2.10.0, TBD
         Environment: 3 server Quorum running on individual AWS boxes.
session timeout set to 1-2 min on most clients.
            Reporter: Running Fly
             Fix For: TBD

    ServiceDiscoveryImpl.reRegisterServices() can be trigger  on ConnectionState events: RECONNECTED
and CONNECTED. Causing the reRegisterServices() method to be run on ConnectionStateManager
thread. If a connection drops while running reRegisterServices() it will be recovered by the
retry policy. However the ConnectionState SUSPENDED followed by RECONNECTED events will be
queued but not fired until reRegisterServices() completes(ConnectionStateManager Thread fires
these events but is in use). When it does complete the RECONNECTED event in the queue will
fire and reRegisterServices() will rerun.
    When zookeeper's server connection is interrupted all of the clients will simultaneously
call reRegisterServices(). This overloads the server with requests causing connections to
timeout and reset. Thus queuing up more RECONNECTED events. This state can persist indefinitely.
    Because the reRegisterServices() will most likely receive a NodeExistsException. It deletes
and recreates the node. Effectively causing the services to thrash up and down. Wreaking havoc
on our service dependency chain. 

This message was sent by Atlassian JIRA

View raw message