Return-Path: Delivered-To: apmail-jakarta-turbine-jcs-user-archive@www.apache.org Received: (qmail 34824 invoked from network); 6 Jan 2005 12:08:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 6 Jan 2005 12:08:01 -0000 Received: (qmail 86051 invoked by uid 500); 6 Jan 2005 12:08:00 -0000 Delivered-To: apmail-jakarta-turbine-jcs-user-archive@jakarta.apache.org Received: (qmail 85961 invoked by uid 500); 6 Jan 2005 12:08:00 -0000 Mailing-List: contact turbine-jcs-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Turbine JCS Users List" Reply-To: "Turbine JCS Users List" Delivered-To: mailing list turbine-jcs-user@jakarta.apache.org Received: (qmail 85947 invoked by uid 99); 6 Jan 2005 12:07:59 -0000 X-ASF-Spam-Status: No, hits=0.4 required=10.0 tests=DNS_FROM_RFC_ABUSE,FORGED_RCVD_HELO,SPF_HELO_PASS X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from smtp-out1.blueyonder.co.uk (HELO smtp-out1.blueyonder.co.uk) (195.188.213.4) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 06 Jan 2005 04:07:57 -0800 Received: from 192.168.0.5 ([82.35.40.151]) by smtp-out1.blueyonder.co.uk with Microsoft SMTPSVC(5.0.2195.6713); Thu, 6 Jan 2005 12:08:26 +0000 Subject: RE: Master cache machine no longer reachable causes spurious threads? From: Matthew Cooke To: Turbine JCS Users List In-Reply-To: <8F208FA62CA1794BACCB61029BB868A106BEFE3E@ex-mail-sea-04.ant.amazon.com> References: <8F208FA62CA1794BACCB61029BB868A106BEFE3E@ex-mail-sea-04.ant.amazon.com> Content-Type: text/plain Date: Thu, 06 Jan 2005 12:17:06 +0000 Message-Id: <1105013826.14490.18.camel@ruby> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-3) Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 06 Jan 2005 12:08:26.0678 (UTC) FILETIME=[6D690D60:01C4F3E8] X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Setting a timeout on the RMI socket is done on construction using something like this: RMISocketFactory.setSocketFactory(new RMISocketFactory() { public Socket createSocket(String host, int port) throws IOException { Socket socket = new Socket(host, port); socket.setSoTimeout(timeoutMillis); socket.setSoLinger(false, 0); return socket; } public ServerSocket createServerSocket(int port) throws IOException { return new ServerSocket(port); } }); I haven't tested this solution, but I had a peak at the code is JCS and it didn't look too hard. async, blocking threads with a timeout might be a good idea, but as timing out the RMI thread at the socket level looks simpler it might be worth putting that in in the interim. Matt. On Wed, 2005-01-05 at 13:12 -0800, Smuts, Aaron wrote: > I could use doug Lea's Future Result and call timedGet(millis). > > http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/FutureResult.html > > This would add some overhead, but it would be safer. > > If a get timeout, I can assume that the server is down an through an error. Then the client will Zombie and start balking, just as it does when we shutdown the server and not the machine. > > I could start by putting it in just the RMI client, since we don't have the same problem elsewhere. I'll try something. > > I think we need a general threadpool configuration mechanism. . . . > > Aaron > > -----Original Message----- > From: Hanson Char [mailto:hanson.char@gmail.com] > Sent: Wednesday, January 05, 2005 12:57 PM > To: Turbine JCS Users List > Subject: Re: Master cache machine no longer reachable causes spurious threads? > > Just an idea: turn each get operation into an asyn operation (using a thread from a thread pool) with a optional timeout parameter (with say a default of 5 secs). > > If the get doesn't finish within the timeout period, just terminate the thread and return null. > > So RMI or not, it's guaranteed not to block under all circumstances. > Probably something from Doug's concurrent (backport) library can be taken advantage of. > > H > > > On Wed, 5 Jan 2005 09:52:38 -0800, Smuts, Aaron wrote: > > If you know of a solution, please send it to me. > > > > Thanks, > > > > Aaron > > > > -----Original Message----- > > From: Matthew Cooke [mailto:mpcooke3@lineone.net] > > Sent: Wednesday, January 05, 2005 2:12 AM > > To: Turbine JCS Users List > > Subject: Re: Master cache machine no longer reachable causes spurious threads? > > > > Master-remote cache Sun JDK 1.4 on Redhat linux 7.3. > > Client machines were Sun JDK 1.4 on linux(prod) and winXP(testing). > > > > The problem was reproduced by executing a "shutdown -h now" on the mastercache machine without cleanly killing the master-remote cache running on it first. Client machines then hang on get's for much longer than 30seconds before throwing a noroutetohost. > > > > Currently we have no fix other than, other than Don't kill the master cache machine suddenly and if the hardware dies panic. Someone was investigating modifying the rmi settings but without success. I know it is possible by modifying the jcs/rmi code as i see many other RMI users have had similar issues (google) and a fix is documented, i can probably dig it up if useful. > > > > Matt. > > > > Smuts, Aaron wrote: > > > I can't reproduce the issue. I can get 30 second pauses if I pull the network cable out, but not 15 minute locks. I'm running the remote server on a windows box and hitting it from a linux box. I can disrupt things sometimes if I pull the network cable out of the windows box running the server. If I just kill the server everything is fine. . . . I'm running jdk 1.4.2_04. > > > > > > What jdk and os are you using? > > > > > > Aaron > > > > > > > > > > > > > > > > > > -----Original Message----- > > > From: Smuts, Aaron [mailto:aaronsm@amazon.com] > > > Sent: Tuesday, January 04, 2005 1:53 PM > > > To: Turbine JCS Users List; mail@timcocks.co.uk > > > Subject: RE: Master cache machine no longer reachable causes spurious threads? > > > > > > The various RMI properties that can be set are listed here. > > > > > > http://java.sun.com/j2se/1.4.2/docs/guide/rmi/sunrmiproperties.html > > > > > > > > > > > > -----Original Message----- > > > From: Smuts, Aaron [mailto:aaronsm@amazon.com] > > > Sent: Tuesday, January 04, 2005 1:40 PM > > > To: mail@timcocks.co.uk > > > Cc: turbine-jcs-user@jakarta.apache.org > > > Subject: RE: Master cache machine no longer reachable causes spurious threads? > > > > > > Remove, and put requests to the remote rmi server are done asynchronously; however, get's are synchronous. > > > > > > If a get locks up, then it could potentially block other put and remove requests locally. Are you seeing all requests block. > > > > > > Why is the situation different if the machine goes down, versus the rmi server not running? I haven't dug into the sun rmi code very far. > > > > > > What do you suggest? > > > > > > You could run in put only mode with remove on put set to false, if you frequently have machines shutting down thereby killing the remote server. > > > > > > Aaron > > > > > > -----Original Message----- > > > From: Tim Cocks [mailto:tcocks@gmail.com] > > > Sent: Tuesday, December 07, 2004 9:53 AM > > > To: Smuts, Aaron > > > Cc: turbine-jcs-user@jakarta.apache.org > > > Subject: Re: Master cache machine no longer reachable causes spurious threads? > > > > > > Thanks for your time. We are using the remote server. We have found it is almost exactly 15 minutes between when the machine running the master cache shuts down and when the clients realise the remote cache is no longer accessible. During those 15 minutes, calls to JCS block. > > > After the 15 minutes, the calls return. > > > > > > The problem appears to be an RMI one. The fact the delay is consistently ~15 minutes seems to imply the timeout is working correctly, but is set too high. We considered changing the RMI timeouts by overriding RMISocketFactory. Unfortunately this would require us to change the JCS source code, something we would like to avoid. > > > > > > Tim > > > > > > On Mon, 6 Dec 2004 13:45:59 -0800, Smuts, Aaron wrote: > > > > > >>I'll need to look into this. > > >> > > >>You are using the remote server? The client reconnect must not be timing out properly. > > >> > > >>Aaron > > >> > > >> > > >> > > >> > > >>-----Original Message----- > > >>From: Tim Cocks [mailto:tcocks@gmail.com] > > >>Sent: Friday, December 03, 2004 2:39 AM > > >>To: turbine-jcs-user@jakarta.apache.org > > >>Subject: Master cache machine no longer reachable causes spurious threads? > > >> > > >>We use JCS outside of Turbine on about 20 machines connected to a JCS master cache. > > >> > > >>On occasion we have had to kill the JCS master cache process and have observed the client machines gracefully realise the master cache is no longer available. They continue to work indefinitely, albeit without access to the master cache. > > >> > > >>However, when the machine running the master cache goes down completely the clients continue attempting to connect. In the process, they are creating more and more blocking threads and the JVM eventually terminates. > > >> > > >>Is this a known problem? If so, are there any solutions? > > >> > > >>Thanks in advance for any help, > > >> > > >>Tim Cocks > > >> > > >>-------------------------------------------------------------------- > > >>- > > >>To unsubscribe, e-mail: > > >>turbine-jcs-user-unsubscribe@jakarta.apache.org > > >>For additional commands, e-mail: > > >>turbine-jcs-user-help@jakarta.apache.org > > >> > > >> > > > > > > > > > -------------------------------------------------------------------- > > > - > > > To unsubscribe, e-mail: > > > turbine-jcs-user-unsubscribe@jakarta.apache.org > > > For additional commands, e-mail: > > > turbine-jcs-user-help@jakarta.apache.org > > > > > > > > > -------------------------------------------------------------------- > > > - > > > To unsubscribe, e-mail: > > > turbine-jcs-user-unsubscribe@jakarta.apache.org > > > For additional commands, e-mail: > > > turbine-jcs-user-help@jakarta.apache.org > > > > > > > > > -------------------------------------------------------------------- > > > - > > > To unsubscribe, e-mail: > > > turbine-jcs-user-unsubscribe@jakarta.apache.org > > > For additional commands, e-mail: > > > turbine-jcs-user-help@jakarta.apache.org > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > > turbine-jcs-user-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: > > turbine-jcs-user-help@jakarta.apache.org > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: > > turbine-jcs-user-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: > > turbine-jcs-user-help@jakarta.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: turbine-jcs-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: turbine-jcs-user-help@jakarta.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: turbine-jcs-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: turbine-jcs-user-help@jakarta.apache.org > -- Matthew Cooke --------------------------------------------------------------------- To unsubscribe, e-mail: turbine-jcs-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: turbine-jcs-user-help@jakarta.apache.org