tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filip Hanik - Dev Lists <devli...@hanik.com>
Subject Re: Rolling 5.5.25?
Date Fri, 17 Aug 2007 19:09:32 GMT
hi Peter,
here is the SVN link
http://svn.apache.org/viewvc?view=rev&revision=567104

basically what I do, in the receiver/sender thread, if an error happens, 
I increment a counter.
this counter also gets decremented upon success.
after X number of consecutive failures, I launch a new thread, called a 
RecoveryThread
this thread simply invokes stop->init->start until it succeeds.

The recovery thread is setup as a singleton, ie, only one can run at any 
point in time.

I think you'll find that the solution in 6, is much simpler, as I don't 
have to change any code in the existing membership stuff.
I had to pull out some initialization from the constructor into the 
init() method, but after that I could use stop/init/start
without changing the sender or receiver threads.

I also changed the logging a little bit, only logging the error once 
(after that log at debug ) to avoid filling up the logs.
the recovery thread will log every 5 seconds.

So to really answer your question after all my bla bla,
Yes, the only option is to shut down the socket and start a new one. But 
to get it done right, I rely on the McastServiceImpl to do the right 
thing during stop() and start(),
instead of recoding that into a new method

Filip

Peter Rossbach wrote:
> HI Filip,
>
> can you explain your 6.0.x fix 
> ((http://issues.apache.org/bugzilla/show_bug.cgi?id=40042).) a little 
> bit, please?
> I think we hava only a chance to recover membership after cluster 
> membership send failure, to reopen the socket.
>
> Here my current cluster 5.5 fix:
>
> ==
>     public class SenderThread extends Thread {
>         long time;
>         McastServiceImpl service ;
>         public SenderThread(long time, McastServiceImpl service) {
>             this.time = time;
>             this.service = service ;
>             setName("Cluster-MembershipSender");
>
>         }
>         public void run() {
>             long retry = 0 ;
>             while ( doRun ) {
>                 try {
>                     send();
>                     retry = 0;
>                 } catch ( Exception x ) {
>                     // FIXME: Only increment as network is really 
> down: NoRouteToHostException or BindException
>                     retry++ ;
>                     log.warn("Unable to send mcast message.",x);
>                 }
>
>                 if(retry > 0) {
>                     if(retry * time < timeToExpiration ) {
>                         try {
>                             Thread.sleep(time);
>                         } catch ( Exception ignore ) {}
>                        restartHeartbeat(retry);
>                     } else {
>                         long recover = retry % 10 ;
>                         try {
>                             Thread.sleep((recover+1)*time);
>                         } catch ( Exception ignore ) {}
>                         if( recover == 0) {
>                             restartHeartbeat(retry) ;
>                         }
>                     }
>                 }
>             }
>         }
>
>         private void restartHeartbeat(long retry) {
>             try {
>                 socket.leaveGroup(address);
>             } catch (IOException ignore) {}
>             try {
>                 log.warn("Restarting membership heartbeat after send 
> failure (number of recovery " + retry + ")");
>                 service.setupSocket();
>                 socket.joinGroup(address);
>             } catch (IOException ignore) {}
>         }
>
>     }//class SenderThread
> ===
> peter
>
>
>
> Am 17.08.2007 um 19:56 schrieb Filip Hanik - Dev Lists:
>
>> Rainer Jung wrote:
>>> Looks like an active weekend then ;)
>> I'm sorry, I just reread friday. Friday next week is totally fine. No 
>> one should have to work on a weekend.
>> also, for the mcast problem, I'm implementing a fix in 6.0 and 6.x, 
>> you should be able to copy that one
>>
>> Filip
>>
>>>
>>> I think that will suffice.
>>>
>>> Regards,
>>>
>>> Rainer
>>>
>>> Filip Hanik - Dev Lists wrote:
>>>> sounds good, lets shoot for Tue or Wed next week then
>>>>
>>>> Filip
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>
>>
>
>
> ------------------------------------------------------------------------
>
> No virus found in this incoming message.
> Checked by AVG Free Edition. 
> Version: 7.5.484 / Virus Database: 269.12.0/957 - Release Date: 8/16/2007 1:46 PM
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message