tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Rossbach ...@objektpark.de>
Subject Re: Rolling 5.5.25?
Date Fri, 17 Aug 2007 19:20:40 GMT
Hi Filip,

OK, but second  is a real problem and frist you fix ;-)
Can you fix it as we call checkExpire at the RecoveryThread?

Peter


Am 17.08.2007 um 21:11 schrieb Filip Hanik - Dev Lists:

> There are a few drawbacks to my current implementation that I need  
> to think about, these are
>
> 1. I also reset the membership map, this should probably not be  
> done at all
> 2. During a failure, since I invoked stop, to reset the thread, I  
> am no longer sending out "member disappared" messages, as the  
> service is not running
>
> Filip
>
> Filip Hanik - Dev Lists wrote:
>> hi Peter,
>> here is the SVN link
>> http://svn.apache.org/viewvc?view=rev&revision=567104
>>
>> basically what I do, in the receiver/sender thread, if an error  
>> happens, I increment a counter.
>> this counter also gets decremented upon success.
>> after X number of consecutive failures, I launch a new thread,  
>> called a RecoveryThread
>> this thread simply invokes stop->init->start until it succeeds.
>>
>> The recovery thread is setup as a singleton, ie, only one can run  
>> at any point in time.
>>
>> I think you'll find that the solution in 6, is much simpler, as I  
>> don't have to change any code in the existing membership stuff.
>> I had to pull out some initialization from the constructor into  
>> the init() method, but after that I could use stop/init/start
>> without changing the sender or receiver threads.
>>
>> I also changed the logging a little bit, only logging the error  
>> once (after that log at debug ) to avoid filling up the logs.
>> the recovery thread will log every 5 seconds.
>>
>> So to really answer your question after all my bla bla,
>> Yes, the only option is to shut down the socket and start a new  
>> one. But to get it done right, I rely on the McastServiceImpl to  
>> do the right thing during stop() and start(),
>> instead of recoding that into a new method
>>
>> Filip
>>
>> Peter Rossbach wrote:
>>> HI Filip,
>>>
>>> can you explain your 6.0.x fix ((http://issues.apache.org/ 
>>> bugzilla/show_bug.cgi?id=40042).) a little bit, please?
>>> I think we hava only a chance to recover membership after cluster  
>>> membership send failure, to reopen the socket.
>>>
>>> Here my current cluster 5.5 fix:
>>>
>>> ==
>>>     public class SenderThread extends Thread {
>>>         long time;
>>>         McastServiceImpl service ;
>>>         public SenderThread(long time, McastServiceImpl service) {
>>>             this.time = time;
>>>             this.service = service ;
>>>             setName("Cluster-MembershipSender");
>>>
>>>         }
>>>         public void run() {
>>>             long retry = 0 ;
>>>             while ( doRun ) {
>>>                 try {
>>>                     send();
>>>                     retry = 0;
>>>                 } catch ( Exception x ) {
>>>                     // FIXME: Only increment as network is really  
>>> down: NoRouteToHostException or BindException
>>>                     retry++ ;
>>>                     log.warn("Unable to send mcast message.",x);
>>>                 }
>>>
>>>                 if(retry > 0) {
>>>                     if(retry * time < timeToExpiration ) {
>>>                         try {
>>>                             Thread.sleep(time);
>>>                         } catch ( Exception ignore ) {}
>>>                        restartHeartbeat(retry);
>>>                     } else {
>>>                         long recover = retry % 10 ;
>>>                         try {
>>>                             Thread.sleep((recover+1)*time);
>>>                         } catch ( Exception ignore ) {}
>>>                         if( recover == 0) {
>>>                             restartHeartbeat(retry) ;
>>>                         }
>>>                     }
>>>                 }
>>>             }
>>>         }
>>>
>>>         private void restartHeartbeat(long retry) {
>>>             try {
>>>                 socket.leaveGroup(address);
>>>             } catch (IOException ignore) {}
>>>             try {
>>>                 log.warn("Restarting membership heartbeat after  
>>> send failure (number of recovery " + retry + ")");
>>>                 service.setupSocket();
>>>                 socket.joinGroup(address);
>>>             } catch (IOException ignore) {}
>>>         }
>>>
>>>     }//class SenderThread
>>> ===
>>> peter
>>>
>>>
>>>
>>> Am 17.08.2007 um 19:56 schrieb Filip Hanik - Dev Lists:
>>>
>>>> Rainer Jung wrote:
>>>>> Looks like an active weekend then ;)
>>>> I'm sorry, I just reread friday. Friday next week is totally  
>>>> fine. No one should have to work on a weekend.
>>>> also, for the mcast problem, I'm implementing a fix in 6.0 and  
>>>> 6.x, you should be able to copy that one
>>>>
>>>> Filip
>>>>
>>>>>
>>>>> I think that will suffice.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Rainer
>>>>>
>>>>> Filip Hanik - Dev Lists wrote:
>>>>>> sounds good, lets shoot for Tue or Wed next week then
>>>>>>
>>>>>> Filip
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>>>>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>>>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>>>
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> ----
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG Free Edition. Version: 7.5.484 / Virus Database:  
>>> 269.12.0/957 - Release Date: 8/16/2007 1:46 PM
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: dev-help@tomcat.apache.org
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message