curator-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jordan Zimmerman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CURATOR-367) Curator may deliver RECONNECTED before LOST in case of session expiry
Date Tue, 17 Jan 2017 13:40:26 GMT

    [ https://issues.apache.org/jira/browse/CURATOR-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826061#comment-15826061
] 

Jordan Zimmerman commented on CURATOR-367:
------------------------------------------

Please submit the PR and we can review it from there.

> Curator may deliver RECONNECTED before LOST in case of session expiry
> ---------------------------------------------------------------------
>
>                 Key: CURATOR-367
>                 URL: https://issues.apache.org/jira/browse/CURATOR-367
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.11.1
>            Reporter: Zoltan Szekeres
>
> h2. Behaviour:
> We saw our code blocked at client.blockUntilConnected() after reconnected on session
expiry.
> h2. Possible reason:
> After receiving a session expired event ConnectionState first resets the connection then
notifies the parent watchers, where the CuratorEvent is created. In this case it seems the
execution of the first zookeeper event thread was delayed before calling the parent watchers.
Meanwhile a new zookeeper event thread was created due to calling reset and this new thread
sent the SyncConnected event earlier than SessionExpired was sent to parent watchers in the
first thread. This resulted ConnectionStateListener instances seeing the RECONNECTED before
the LOST.
> h2. Logs:
> 2016-11-17T20:23:28.527Z [Thread-0-SendThread(]  INFO              ClientCnxn: Opening
socket connection to server _
> 2016-11-17T20:23:28.535Z [Thread-0-SendThread(]  INFO              ClientCnxn: Socket
connection established to _, initiating session
> 2016-11-17T20:23:28.576Z [Thread-0-SendThread(]  INFO              ClientCnxn: Unable
to reconnect to ZooKeeper service, session 0xc585ba1e7b6adc2 has expired, closing socket connection
> 2016-11-17T20:23:28.576Z [Thread-0-EventThread]  WARN         ConnectionState: Session
expired event received
> 2016-11-17T20:23:28.673Z [Thread-0-EventThread]  INFO               ZooKeeper: Initiating
client connection, connectString=_ sessionTimeout=30000 watcher=org.apache.curator.ConnectionState@6ddf3f9d
> 2016-11-17T20:23:28.691Z [Thread-0-SendThread(]  INFO              ClientCnxn: Opening
socket connection to server _
> 2016-11-17T20:23:28.693Z [Thread-0-SendThread(]  INFO              ClientCnxn: Socket
connection established to _, initiating session
> 2016-11-17T20:23:28.701Z [Thread-0-SendThread(]  INFO              ClientCnxn: Session
establishment complete on server _, sessionid = 0x2585ba1e69ffeca, negotiated timeout = 30000
> 2016-11-17T20:23:28.701Z [Thread-0-EventThread]  INFO  ConnectionStateManager: State
change: RECONNECTED
> 2016-11-17T20:23:28.715Z [Thread-0-EventThread]  INFO  ConnectionStateManager: State
change: LOST
> 2016-11-17T20:23:28.715Z [Thread-0-EventThread]  INFO              ClientCnxn: EventThread
shut down
> h2. Reproduction:
> I was only able to reproduce the behaviour by adding artificial Thread.sleep in ConnectionState#process
before calling the parent watchers if the event is session expired.
> {code:title=ConnectionState#process}
> @Override
> public void process(WatchedEvent event)
> {
> 	if ( LOG_EVENTS )
> 	{
> 		log.debug("ConnectState watcher: " + event);
> 	}
> 	if ( event.getType() == Watcher.Event.EventType.None )
> 	{
> 		boolean wasConnected = isConnected.get();
> 		boolean newIsConnected = checkState(event.getState(), wasConnected);
> 		if ( newIsConnected != wasConnected )
> 		{
> 			isConnected.set(newIsConnected);
> 			connectionStartMs = System.currentTimeMillis();
> 		}
> 	}
> 	if (event.getState() == KeeperState.Expired)
> 	{
> 		System.err.println("::> sleep in ConnectionState#process");
> 		try {
> 			Thread.sleep(1000);
> 		} catch (InterruptedException e) {}
> 	}
> 	for ( Watcher parentWatcher : parentWatchers )
> 	{
> 		TimeTrace timeTrace = new TimeTrace("connection-state-parent-process", tracer.get());
> 		parentWatcher.process(event);
> 		timeTrace.commit();
> 	}
> }
> {code}
> h2. Some ideas for fix:
>  * Add the event handling and calling parent watchers into a synchronized block.
>  * Change the order of handling watched event and calling parent watchers (I'm not aware
of the behaviour implications of this).
>  * Move only calling reset to the end of the method "process".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message