tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filip Hanik - Dev Lists <devli...@hanik.com>
Subject Re: Rolling 5.5.25?
Date Fri, 17 Aug 2007 19:39:11 GMT
Peter Rossbach wrote:
> Hi Filip,
>
> OK, but second  is a real problem and frist you fix ;-)
> Can you fix it as we call checkExpire at the RecoveryThread?
I don't know about this one, I could call checkExpire, but if the 
datagram socket is down, then is the expiration real?
I guess this should be done, to still guarantee correct notifications 
according to how it works.

In a situation like this, your cluster will be out of sync, since once 
the network card is backup, no state transfer is initiated again.
what are your thoughts?
Filip

>
> Peter
>
>
> Am 17.08.2007 um 21:11 schrieb Filip Hanik - Dev Lists:
>
>> There are a few drawbacks to my current implementation that I need to 
>> think about, these are
>>
>> 1. I also reset the membership map, this should probably not be done 
>> at all
>> 2. During a failure, since I invoked stop, to reset the thread, I am 
>> no longer sending out "member disappared" messages, as the service is 
>> not running
>>
>> Filip
>>
>> Filip Hanik - Dev Lists wrote:
>>> hi Peter,
>>> here is the SVN link
>>> http://svn.apache.org/viewvc?view=rev&revision=567104
>>>
>>> basically what I do, in the receiver/sender thread, if an error 
>>> happens, I increment a counter.
>>> this counter also gets decremented upon success.
>>> after X number of consecutive failures, I launch a new thread, 
>>> called a RecoveryThread
>>> this thread simply invokes stop->init->start until it succeeds.
>>>
>>> The recovery thread is setup as a singleton, ie, only one can run at 
>>> any point in time.
>>>
>>> I think you'll find that the solution in 6, is much simpler, as I 
>>> don't have to change any code in the existing membership stuff.
>>> I had to pull out some initialization from the constructor into the 
>>> init() method, but after that I could use stop/init/start
>>> without changing the sender or receiver threads.
>>>
>>> I also changed the logging a little bit, only logging the error once 
>>> (after that log at debug ) to avoid filling up the logs.
>>> the recovery thread will log every 5 seconds.
>>>
>>> So to really answer your question after all my bla bla,
>>> Yes, the only option is to shut down the socket and start a new one. 
>>> But to get it done right, I rely on the McastServiceImpl to do the 
>>> right thing during stop() and start(),
>>> instead of recoding that into a new method
>>>
>>> Filip
>>>
>>> Peter Rossbach wrote:
>>>> HI Filip,
>>>>
>>>> can you explain your 6.0.x fix 
>>>> ((http://issues.apache.org/bugzilla/show_bug.cgi?id=40042).) a 
>>>> little bit, please?
>>>> I think we hava only a chance to recover membership after cluster 
>>>> membership send failure, to reopen the socket.
>>>>
>>>> Here my current cluster 5.5 fix:
>>>>
>>>> ==
>>>>     public class SenderThread extends Thread {
>>>>         long time;
>>>>         McastServiceImpl service ;
>>>>         public SenderThread(long time, McastServiceImpl service) {
>>>>             this.time = time;
>>>>             this.service = service ;
>>>>             setName("Cluster-MembershipSender");
>>>>
>>>>         }
>>>>         public void run() {
>>>>             long retry = 0 ;
>>>>             while ( doRun ) {
>>>>                 try {
>>>>                     send();
>>>>                     retry = 0;
>>>>                 } catch ( Exception x ) {
>>>>                     // FIXME: Only increment as network is really 
>>>> down: NoRouteToHostException or BindException
>>>>                     retry++ ;
>>>>                     log.warn("Unable to send mcast message.",x);
>>>>                 }
>>>>
>>>>                 if(retry > 0) {
>>>>                     if(retry * time < timeToExpiration ) {
>>>>                         try {
>>>>                             Thread.sleep(time);
>>>>                         } catch ( Exception ignore ) {}
>>>>                        restartHeartbeat(retry);
>>>>                     } else {
>>>>                         long recover = retry % 10 ;
>>>>                         try {
>>>>                             Thread.sleep((recover+1)*time);
>>>>                         } catch ( Exception ignore ) {}
>>>>                         if( recover == 0) {
>>>>                             restartHeartbeat(retry) ;
>>>>                         }
>>>>                     }
>>>>                 }
>>>>             }
>>>>         }
>>>>
>>>>         private void restartHeartbeat(long retry) {
>>>>             try {
>>>>                 socket.leaveGroup(address);
>>>>             } catch (IOException ignore) {}
>>>>             try {
>>>>                 log.warn("Restarting membership heartbeat after 
>>>> send failure (number of recovery " + retry + ")");
>>>>                 service.setupSocket();
>>>>                 socket.joinGroup(address);
>>>>             } catch (IOException ignore) {}
>>>>         }
>>>>
>>>>     }//class SenderThread
>>>> ===
>>>> peter
>>>>
>>>>
>>>>
>>>> Am 17.08.2007 um 19:56 schrieb Filip Hanik - Dev Lists:
>>>>
>>>>> Rainer Jung wrote:
>>>>>> Looks like an active weekend then ;)
>>>>> I'm sorry, I just reread friday. Friday next week is totally fine. 
>>>>> No one should have to work on a weekend.
>>>>> also, for the mcast problem, I'm implementing a fix in 6.0 and 
>>>>> 6.x, you should be able to copy that one
>>>>>
>>>>> Filip
>>>>>
>>>>>>
>>>>>> I think that will suffice.
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Rainer
>>>>>>
>>>>>> Filip Hanik - Dev Lists wrote:
>>>>>>> sounds good, lets shoot for Tue or Wed next week then
>>>>>>>
>>>>>>> Filip
>>>>>>
>>>>>> ---------------------------------------------------------------------

>>>>>>
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>>>>>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>>>>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>>>>
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------

>>>>
>>>>
>>>> No virus found in this incoming message.
>>>> Checked by AVG Free Edition. Version: 7.5.484 / Virus Database: 
>>>> 269.12.0/957 - Release Date: 8/16/2007 1:46 PM
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
>> For additional commands, e-mail: dev-help@tomcat.apache.org
>>
>>
>
>
> ------------------------------------------------------------------------
>
> No virus found in this incoming message.
> Checked by AVG Free Edition. 
> Version: 7.5.484 / Virus Database: 269.12.0/957 - Release Date: 8/16/2007 1:46 PM
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message