river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Dolan" <christopher.do...@avid.com>
Subject RE: client hang in com.sun.jini.jeri.internal.mux.Mux.start()
Date Tue, 21 Jun 2011 13:23:20 GMT
Peter,
My patch for this problem was accepted into River 2.2.0 via
https://issues.apache.org/jira/browse/RIVER-397
Chris

-----Original Message-----
From: Peter Jones [mailto:pcj@roundroom.net] 
Sent: Monday, June 20, 2011 5:14 PM
To: dev@river.apache.org
Subject: Re: client hang in com.sun.jini.jeri.internal.mux.Mux.start()

Chris,

A bit late but FWIW: your reasoning and low-risk solution look right to
me-- I believe that it's pretty much what that "REMIND" comment was
intending.

(I think that there was some thought that this method shouldn't actually
block for the handshake response, instead just letting it get processed
asynchronously and only blocking when necessary later.  But, if I
remember correctly, given how the upper layers actually use this code, I
don't think that such a change would have a practical benefit for the
added complexity.)

Cheers,

-- Peter


On Apr 29, 2011, at 2:40 PM, Christopher Dolan wrote:

> I've experienced occasional cases where clients get stuck in the
> following block of code in Mux.start. Has anyone experienced this
> problem? I have a proposed solution below. Has anyone thought about a
> similar solution already?
> 
> -- Current code --
> 1 	    asyncSendClientConnectionHeader();
> 2 	    synchronized (muxLock) {
> 3 		while (!muxDown && !clientConnectionReady) {
> 4 		    try {
> 5 			muxLock.wait();		// REMIND: timeout?
> 6 		    } catch (InterruptedException e) {
> 7 			...
> 8 		    }
> 9 		}
> 10		if (muxDown) {
> 11		    IOException ioe = new IOException(muxDownMessage);
> 12		    ioe.initCause(muxDownCause);
> 13		    throw ioe;
> 14		}
> 15	    }
> 
> -- Explanation of the code --
> This code handles the initial client-server handshake that starts a
JERI
> connection. In line 1, the client sends its 8-byte greeting to the
> server. Then in the loop on lines 3-9, it waits for the server's
> response. If the reader thread gets a satisfactory response from the
> server, it sets clientConnectionReady=true and calls
> muxLock.notifyAll(). In all other cases (aborted connection,
mismatched
> protocol version, etc) the reader invokes Mux.setDown() which sets
> muxDown=true and calls muxLock.notifyAll(). In lines 10-14, it throws
if
> the handshake was a failure.
> 
> In my scenario (which uses simple TCP sockets, nothing fancy), the
> invoker thread sits on line 5 indefinitely. My problem hard to
> reproduce, so I haven't found out what the server is doing in this
case.
> I hope to figure that out eventually, but presently I'm interested in
> the "REMIND: timeout?" comment.
> 
> -- Timeout solution --
> It seems obvious to me that there should be a timeout here. There are
> lots of imaginable cases where the client could get stuck here:
> server-side deadlock, abrupt server crash, logic error in client Mux
> code. You'd expect that the server would either respond with its
8-byte
> handshake very quickly or never, so a modest timeout (like 15 or 30
> seconds) should be good. If that timeout is triggered, I would expect
> that the code above would call Mux.setDown() and throw an IOException.
> That exception would either cause a retry or be thrown up to the
invoker
> as a RemoteException.
> 
> -- Proposed code (untested) --
> 3 		long now = System.currentTimeMillis();
> 4 		long endTime = now + timeoutMillis;
> 5 		while (!muxDown && !clientConnectionReady) {
> 6 		    if (now >= endTime) {
> 7 			setDown("timeout waiting for server to respond
> to handshake", null);
> 8 		    } else {
> 9 			try {
> 10			    muxLock.wait(endTime - now);
> 11			    now = System.currentTimeMillis();
> 12			} catch (InterruptedException e) {
> 13			    setDown("interrupt waiting for connection
> header", e);
> 14			}
> 15		    }
> 16		}
> 
> This code assumes a configurable timeoutMillis parameter has been set
> earlier.
> 
> I can't think of any alternative solutions. Putting the timeout in the
> Reader logic seems higher risk. There's incomplete code in JERI to
> implement a ping packet (see Mux.asyncSendPing, never used), but that
> would only be relevant after the initial handshake and wouldn't help
> here.
> 
> Thanks,
> Chris
> 


Mime
View raw message