tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filip Hanik - Dev Lists <devli...@hanik.com>
Subject Re: Rolling 5.5.25?
Date Fri, 17 Aug 2007 19:11:32 GMT
There are a few drawbacks to my current implementation that I need to 
think about, these are

1. I also reset the membership map, this should probably not be done at all
2. During a failure, since I invoked stop, to reset the thread, I am no 
longer sending out "member disappared" messages, as the service is not 
running

Filip

Filip Hanik - Dev Lists wrote:
> hi Peter,
> here is the SVN link
> http://svn.apache.org/viewvc?view=rev&revision=567104
>
> basically what I do, in the receiver/sender thread, if an error 
> happens, I increment a counter.
> this counter also gets decremented upon success.
> after X number of consecutive failures, I launch a new thread, called 
> a RecoveryThread
> this thread simply invokes stop->init->start until it succeeds.
>
> The recovery thread is setup as a singleton, ie, only one can run at 
> any point in time.
>
> I think you'll find that the solution in 6, is much simpler, as I 
> don't have to change any code in the existing membership stuff.
> I had to pull out some initialization from the constructor into the 
> init() method, but after that I could use stop/init/start
> without changing the sender or receiver threads.
>
> I also changed the logging a little bit, only logging the error once 
> (after that log at debug ) to avoid filling up the logs.
> the recovery thread will log every 5 seconds.
>
> So to really answer your question after all my bla bla,
> Yes, the only option is to shut down the socket and start a new one. 
> But to get it done right, I rely on the McastServiceImpl to do the 
> right thing during stop() and start(),
> instead of recoding that into a new method
>
> Filip
>
> Peter Rossbach wrote:
>> HI Filip,
>>
>> can you explain your 6.0.x fix 
>> ((http://issues.apache.org/bugzilla/show_bug.cgi?id=40042).) a little 
>> bit, please?
>> I think we hava only a chance to recover membership after cluster 
>> membership send failure, to reopen the socket.
>>
>> Here my current cluster 5.5 fix:
>>
>> ==
>>     public class SenderThread extends Thread {
>>         long time;
>>         McastServiceImpl service ;
>>         public SenderThread(long time, McastServiceImpl service) {
>>             this.time = time;
>>             this.service = service ;
>>             setName("Cluster-MembershipSender");
>>
>>         }
>>         public void run() {
>>             long retry = 0 ;
>>             while ( doRun ) {
>>                 try {
>>                     send();
>>                     retry = 0;
>>                 } catch ( Exception x ) {
>>                     // FIXME: Only increment as network is really 
>> down: NoRouteToHostException or BindException
>>                     retry++ ;
>>                     log.warn("Unable to send mcast message.",x);
>>                 }
>>
>>                 if(retry > 0) {
>>                     if(retry * time < timeToExpiration ) {
>>                         try {
>>                             Thread.sleep(time);
>>                         } catch ( Exception ignore ) {}
>>                        restartHeartbeat(retry);
>>                     } else {
>>                         long recover = retry % 10 ;
>>                         try {
>>                             Thread.sleep((recover+1)*time);
>>                         } catch ( Exception ignore ) {}
>>                         if( recover == 0) {
>>                             restartHeartbeat(retry) ;
>>                         }
>>                     }
>>                 }
>>             }
>>         }
>>
>>         private void restartHeartbeat(long retry) {
>>             try {
>>                 socket.leaveGroup(address);
>>             } catch (IOException ignore) {}
>>             try {
>>                 log.warn("Restarting membership heartbeat after send 
>> failure (number of recovery " + retry + ")");
>>                 service.setupSocket();
>>                 socket.joinGroup(address);
>>             } catch (IOException ignore) {}
>>         }
>>
>>     }//class SenderThread
>> ===
>> peter
>>
>>
>>
>> Am 17.08.2007 um 19:56 schrieb Filip Hanik - Dev Lists:
>>
>>> Rainer Jung wrote:
>>>> Looks like an active weekend then ;)
>>> I'm sorry, I just reread friday. Friday next week is totally fine. 
>>> No one should have to work on a weekend.
>>> also, for the mcast problem, I'm implementing a fix in 6.0 and 6.x, 
>>> you should be able to copy that one
>>>
>>> Filip
>>>
>>>>
>>>> I think that will suffice.
>>>>
>>>> Regards,
>>>>
>>>> Rainer
>>>>
>>>> Filip Hanik - Dev Lists wrote:
>>>>> sounds good, lets shoot for Tue or Wed next week then
>>>>>
>>>>> Filip
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>>>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>>>
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> No virus found in this incoming message.
>> Checked by AVG Free Edition. Version: 7.5.484 / Virus Database: 
>> 269.12.0/957 - Release Date: 8/16/2007 1:46 PM
>>   
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message