river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gregg Wonderly <gr...@wonderly.org>
Subject Re: Question about LeaseRenewalManager and renewDuration
Date Tue, 10 Jul 2012 18:44:33 GMT
Recall, that under the covers there is also all the OS network stack behaviors.  What is the
TCP SYN timeout, for example; i.e. how long will a TCP connect request, which will eventually
fail, take before failing?

I think it's important to understand that unless you are on either end of a TCP connection,
with timeout and keep alive settings for that connection, turned down to short intervals,
that you're going to be mystified at the longer than expected timing of most failure detections.

Subclassing the appropriate endpoint class, and adjusting it's behavior and using that on
your registrar may be part of what you need to do, to see quick notifications.

Gregg Wonderly

On Jul 10, 2012, at 10:49 AM, Greg Trasuk wrote:

> 
> On Tue, 2012-07-10 at 10:14, Itai Frenkel wrote:
>>>> Are you sure about that?  
>> Looking at RegistrarImpl when ThrowableConstants.retryable(e) returns BAD_OBJECT,
it rethrows only if (e instanceof Error), otherwise it cancels the lease. Since ConnectException
is not an Error the lease would be canceled.
>> Why is the Error check being performed ?
>> 
> ThrowableConstants.retryable(e) only returns BAD_OBJECT if it receives a
> definite response from the remote endpoint.  For a comm failure, it
> should return INDEFINITE.  Having said that, the logic seems to favour
> declaring an exception "Definite" where it might be arguable.  For
> instance, it will declare BAD_OBJECT in the case of a "No route to host"
> exception, which arguably could be temporary, for instance if a router
> goes offline.
> 
>>>> Personally, I'd use an internal timer on the client side that says "if I
don't receive any events for a given time, I'll cancel the current lease and re-register".
 
>> That requires the Registrar to periodically send probe notifications. The number
of real world notifications could fluctuate from zero to high load and cannot be trusted without
probe notifications.
>> 
> Might be an interesting improvement if a client could request a
> heartbeat or supervisory message from the registrar.  But my point above
> was that if the events are not coming fast enough to satisfy a
> reasonable "liveness" timeout, then it's probably not a big problem if
> the client simply cancels the lease and re-registers.  So you could
> effectively implement your own heartbeat.
> 
> Alternately (subject to exploring the loading and the number of clients)
> you could create a service that does nothing but registers, then updates
> its service attributes periodically, which would have the effect of
> generating registrar messages.  Starting to get a little complicated and
> indirect, though.
> 
> In the end, however, it seems like your trying to have the client find
> out that it's not receiving registrar notifications.  I can't think of
> any better evidence than "you're not receiving registrar notifications".
> 
> Cheers,
> 
> Greg.
> 
>> Thanks,
>> Itai
>> 
>> -----Original Message-----
>> From: Greg Trasuk [mailto:trasukg@stratuscom.com] 
>> Sent: Tuesday, July 10, 2012 4:36 PM
>> To: dev@river.apache.org
>> Subject: Re: Question about LeaseRenewalManager and renewDuration
>> 
>> 
>> On Tue, 2012-07-10 at 06:41, Itai Frenkel wrote:
>> <snip...>
>>> Background Information:
>>> The motivation for this is the way the Registrar handles event notifications.
>>> When the Registrar fails to send a notification to a listener due to a 
>>> temporary network glitch, it assumes the listener is no longer available and
cancels the event lease.
>> 
>> Are you sure about that?  Looking through com.sun.jini.reggie.RegistrarImpl, it appears
that when an exception occurs during event notification, the code tries to categorize the
exception as either "definite" (no such event, no such object, etc) or "indefinite" (communications
failure).  Then it only cancels the lease on a definite exception.
>> 
>> In other words, the lease is maintained in the case of a temporary network failure.
 After all, that's the whole point of the lease: it represents an agreement between the client
and service that resources are going to be maintained for a definite time period.  
>> 
>> Personally, I'd use an internal timer on the client side that says "if I don't receive
any events for a given time, I'll cancel the current lease and re-register".  If the events
are that quiet, then clearly the registrar is not that heavily loaded, so the overhead of
cancelling the lease and creating a new registration should not be too bad.  You'd want to
test it under simulated load, of course.
>> 
>> Cheers,
>> 
>> Greg.
>> --
>> Greg Trasuk, President
>> StratusCom Manufacturing Systems Inc. - We use information technology to solve business
problems on your plant floor.
>> http://stratuscom.com
>> 
>> 
>> 
> 


Mime
View raw message