jakarta-jcs-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Cooke <mpcoo...@lineone.net>
Subject RE: Master cache machine no longer reachable causes spurious threads?
Date Thu, 06 Jan 2005 12:17:06 GMT
Setting a timeout on the RMI socket is done on construction using
something like this: 

RMISocketFactory.setSocketFactory(new RMISocketFactory() {
				public Socket createSocket(String host, int port) throws IOException {
					Socket socket = new Socket(host, port);
					socket.setSoTimeout(timeoutMillis);
					socket.setSoLinger(false, 0);
					return socket;
				}
				public ServerSocket createServerSocket(int port) throws IOException {
					return new ServerSocket(port);
				}
			});

I haven't tested this solution, but I had a peak at the code is JCS and
it didn't look too hard. async, blocking threads with a timeout might be
a good idea, but as timing out the RMI thread at the socket level looks
simpler it might be worth putting that in in the interim.

Matt.

On Wed, 2005-01-05 at 13:12 -0800, Smuts, Aaron wrote:
> I could use doug Lea's Future Result and call timedGet(millis).  
> 
> http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/FutureResult.html
> 
> This would add some overhead, but it would be safer.  
> 
> If a get timeout, I can assume that the server is down an through an error.  Then the
client will Zombie and start balking, just as it does when we shutdown the server and not
the machine.
> 
> I could start by putting it in just the RMI client, since we don't have the same problem
elsewhere.  I'll try something.
> 
> I think we need a general threadpool configuration mechanism. . . .
> 
> Aaron 
> 
> -----Original Message-----
> From: Hanson Char [mailto:hanson.char@gmail.com] 
> Sent: Wednesday, January 05, 2005 12:57 PM
> To: Turbine JCS Users List
> Subject: Re: Master cache machine no longer reachable causes spurious threads?
> 
> Just an idea: turn each get operation into an asyn operation (using a thread from a thread
pool) with a optional timeout parameter (with say a default of 5 secs).
> 
> If the get doesn't finish within the timeout period, just terminate the thread and return
null.
> 
> So RMI or not, it's guaranteed not to block under all circumstances. 
> Probably something from Doug's concurrent (backport) library can be taken advantage of.
> 
> H
> 
> 
> On Wed, 5 Jan 2005 09:52:38 -0800, Smuts, Aaron <aaronsm@amazon.com> wrote:
> > If you know of a solution, please send it to me.
> > 
> > Thanks,
> > 
> > Aaron
> > 
> > -----Original Message-----
> > From: Matthew Cooke [mailto:mpcooke3@lineone.net]
> > Sent: Wednesday, January 05, 2005 2:12 AM
> > To: Turbine JCS Users List
> > Subject: Re: Master cache machine no longer reachable causes spurious threads?
> > 
> > Master-remote cache Sun JDK 1.4 on Redhat linux 7.3.
> > Client machines were Sun JDK 1.4 on linux(prod) and winXP(testing).
> > 
> > The problem was reproduced by executing a "shutdown -h now" on the mastercache machine
without cleanly killing the master-remote cache running on it first. Client machines then
hang on get's for much longer than 30seconds before throwing a noroutetohost.
> > 
> > Currently we have no fix other than, other than Don't kill the master cache machine
suddenly and if the hardware dies panic. Someone was investigating modifying the rmi settings
but without success. I know it is possible by modifying the jcs/rmi code as i see many other
RMI users have had similar issues (google) and a fix is documented, i can probably dig it
up if useful.
> > 
> > Matt.
> > 
> > Smuts, Aaron wrote:
> > > I can't reproduce the issue.  I can get 30 second pauses if I pull the network
cable out, but not 15 minute locks.  I'm running the remote server on a windows box and hitting
it from a linux box.  I can disrupt things sometimes if I pull the network cable out of the
windows box running the server.  If I just kill the server everything is fine.  . . .   I'm
running jdk 1.4.2_04.
> > >
> > > What jdk and os are you using?
> > >
> > > Aaron
> > >
> > >
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Smuts, Aaron [mailto:aaronsm@amazon.com]
> > > Sent: Tuesday, January 04, 2005 1:53 PM
> > > To: Turbine JCS Users List; mail@timcocks.co.uk
> > > Subject: RE: Master cache machine no longer reachable causes spurious threads?
> > >
> > > The various RMI properties that can be set are listed here.
> > >
> > > http://java.sun.com/j2se/1.4.2/docs/guide/rmi/sunrmiproperties.html
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Smuts, Aaron [mailto:aaronsm@amazon.com]
> > > Sent: Tuesday, January 04, 2005 1:40 PM
> > > To: mail@timcocks.co.uk
> > > Cc: turbine-jcs-user@jakarta.apache.org
> > > Subject: RE: Master cache machine no longer reachable causes spurious threads?
> > >
> > > Remove, and put requests to the remote rmi server are done asynchronously;
however, get's are synchronous.
> > >
> > > If a get locks up, then it could potentially block other put and remove requests
locally.  Are you seeing all requests block.
> > >
> > > Why is the situation different if the machine goes down, versus the rmi server
not running?  I haven't dug into the sun rmi code very far.
> > >
> > > What do you suggest?
> > >
> > > You could run in put only mode with remove on put set to false, if you frequently
have machines shutting down thereby killing the remote server.
> > >
> > > Aaron
> > >
> > > -----Original Message-----
> > > From: Tim Cocks [mailto:tcocks@gmail.com]
> > > Sent: Tuesday, December 07, 2004 9:53 AM
> > > To: Smuts, Aaron
> > > Cc: turbine-jcs-user@jakarta.apache.org
> > > Subject: Re: Master cache machine no longer reachable causes spurious threads?
> > >
> > > Thanks for your time.  We are using the remote server.  We have found it is
almost exactly 15 minutes between when the machine running the master cache shuts down and
when the clients realise the remote cache is no longer accessible.  During those 15 minutes,
calls to JCS block.
> > >  After the 15 minutes, the calls return.
> > >
> > > The problem appears to be an RMI one. The fact the delay is consistently ~15
minutes seems to imply the timeout is working correctly, but is set too high.  We considered
changing the RMI timeouts by overriding RMISocketFactory. Unfortunately this would require
us to change the JCS source code, something we would like to avoid.
> > >
> > > Tim
> > >
> > > On Mon, 6 Dec 2004 13:45:59 -0800, Smuts, Aaron <aaronsm@amazon.com>
wrote:
> > >
> > >>I'll need to look into this.
> > >>
> > >>You are using the remote server?  The client reconnect must not be timing
out properly.
> > >>
> > >>Aaron
> > >>
> > >>
> > >>
> > >>
> > >>-----Original Message-----
> > >>From: Tim Cocks [mailto:tcocks@gmail.com]
> > >>Sent: Friday, December 03, 2004 2:39 AM
> > >>To: turbine-jcs-user@jakarta.apache.org
> > >>Subject: Master cache machine no longer reachable causes spurious threads?
> > >>
> > >>We use JCS outside of Turbine on about 20 machines connected to a JCS master
cache.
> > >>
> > >>On occasion we have had to kill the JCS master cache process and have observed
the client machines gracefully realise the master cache is no longer available.  They continue
to work indefinitely, albeit without access to the master cache.
> > >>
> > >>However, when the machine running the master cache goes down completely
the clients continue attempting to connect.  In the process, they are creating more and more
blocking threads and the JVM eventually terminates.
> > >>
> > >>Is this a known problem?  If so, are there any solutions?
> > >>
> > >>Thanks in advance for any help,
> > >>
> > >>Tim Cocks
> > >>
> > >>--------------------------------------------------------------------
> > >>-
> > >>To unsubscribe, e-mail:
> > >>turbine-jcs-user-unsubscribe@jakarta.apache.org
> > >>For additional commands, e-mail:
> > >>turbine-jcs-user-help@jakarta.apache.org
> > >>
> > >>
> > >
> > >
> > > --------------------------------------------------------------------
> > > -
> > > To unsubscribe, e-mail:
> > > turbine-jcs-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail:
> > > turbine-jcs-user-help@jakarta.apache.org
> > >
> > >
> > > --------------------------------------------------------------------
> > > -
> > > To unsubscribe, e-mail:
> > > turbine-jcs-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail:
> > > turbine-jcs-user-help@jakarta.apache.org
> > >
> > >
> > > --------------------------------------------------------------------
> > > -
> > > To unsubscribe, e-mail:
> > > turbine-jcs-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail:
> > > turbine-jcs-user-help@jakarta.apache.org
> > >
> > >
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: 
> > turbine-jcs-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: 
> > turbine-jcs-user-help@jakarta.apache.org
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: 
> > turbine-jcs-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: 
> > turbine-jcs-user-help@jakarta.apache.org
> > 
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: turbine-jcs-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: turbine-jcs-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: turbine-jcs-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: turbine-jcs-user-help@jakarta.apache.org
> 
-- 
Matthew Cooke <mpcooke3@lineone.net>


---------------------------------------------------------------------
To unsubscribe, e-mail: turbine-jcs-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: turbine-jcs-user-help@jakarta.apache.org


Mime
View raw message