river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Dolan" <christopher.do...@avid.com>
Subject client hang in com.sun.jini.jeri.internal.mux.Mux.start()
Date Fri, 29 Apr 2011 18:40:12 GMT
I've experienced occasional cases where clients get stuck in the
following block of code in Mux.start. Has anyone experienced this
problem? I have a proposed solution below. Has anyone thought about a
similar solution already?

-- Current code --
1 	    asyncSendClientConnectionHeader();
2 	    synchronized (muxLock) {
3 		while (!muxDown && !clientConnectionReady) {
4 		    try {
5 			muxLock.wait();		// REMIND: timeout?
6 		    } catch (InterruptedException e) {
7 			...
8 		    }
9 		}
10		if (muxDown) {
11		    IOException ioe = new IOException(muxDownMessage);
12		    ioe.initCause(muxDownCause);
13		    throw ioe;
14		}
15	    }

-- Explanation of the code --
This code handles the initial client-server handshake that starts a JERI
connection. In line 1, the client sends its 8-byte greeting to the
server. Then in the loop on lines 3-9, it waits for the server's
response. If the reader thread gets a satisfactory response from the
server, it sets clientConnectionReady=true and calls
muxLock.notifyAll(). In all other cases (aborted connection, mismatched
protocol version, etc) the reader invokes Mux.setDown() which sets
muxDown=true and calls muxLock.notifyAll(). In lines 10-14, it throws if
the handshake was a failure.

In my scenario (which uses simple TCP sockets, nothing fancy), the
invoker thread sits on line 5 indefinitely. My problem hard to
reproduce, so I haven't found out what the server is doing in this case.
I hope to figure that out eventually, but presently I'm interested in
the "REMIND: timeout?" comment.

-- Timeout solution --
It seems obvious to me that there should be a timeout here. There are
lots of imaginable cases where the client could get stuck here:
server-side deadlock, abrupt server crash, logic error in client Mux
code. You'd expect that the server would either respond with its 8-byte
handshake very quickly or never, so a modest timeout (like 15 or 30
seconds) should be good. If that timeout is triggered, I would expect
that the code above would call Mux.setDown() and throw an IOException.
That exception would either cause a retry or be thrown up to the invoker
as a RemoteException.

-- Proposed code (untested) --
3 		long now = System.currentTimeMillis();
4 		long endTime = now + timeoutMillis;
5 		while (!muxDown && !clientConnectionReady) {
6 		    if (now >= endTime) {
7 			setDown("timeout waiting for server to respond
to handshake", null);
8 		    } else {
9 			try {
10			    muxLock.wait(endTime - now);
11			    now = System.currentTimeMillis();
12			} catch (InterruptedException e) {
13			    setDown("interrupt waiting for connection
header", e);
14			}
15		    }
16		}

This code assumes a configurable timeoutMillis parameter has been set
earlier.

I can't think of any alternative solutions. Putting the timeout in the
Reader logic seems higher risk. There's incomplete code in JERI to
implement a ping packet (see Mux.asyncSendPing, never used), but that
would only be relevant after the initial handshake and wouldn't help
here.

Thanks,
Chris


Mime
View raw message