river-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Dolan" <christopher.do...@avid.com>
Subject Registrar lease timeouts, service liveness
Date Tue, 26 Oct 2010 19:00:16 GMT
What lease timeouts are people using with Reggie?

My project currently uses a 10 minute timeout.  We chose that value as
balance between 1) wanting to know quickly when a service crashes and 2)
performance concerns with the Reggie implementation.

I've become dissatisfied with that compromise, however, particularly in
cases where the service is actually live but the registrar has gone bad
or a clock sync anomaly has occurred (both cause false negatives).
Ideally I would like to disentangle the notion of an expected service
and the liveness of that service.  That is, I would like to be able to
query the registrar separately for all of the services that are supposed
to be running and all of the services that are actually running right

Take for example a collection of redundant services intended to be used
round-robin.  I want clients to prefer to contact only the services
known to be alive to avoid TCP timeouts.  But if the registrar thinks
all of them are down, I still want clients to try to contact them just
in case the registrar is wrong.  So, I don't want the services to be
removed from the LookupCache completely.

I've considered adding an Entry to the service's attributeSets that says
if the service is alive, and setting the registration lease duration to
be very long.  In that case, I would need to alter Reggie to fill in
that attribute as "missing" when a service failed to check before a
liveness timeout but not actually cancel the service lease.  With an
implementation like that, it would be trivial for me to pick out the
live services with a simple ServiceItemFilter on the LookupCache.
Another idea is to implement this client side: use a short lease timeout
with Reggie but add some longer-term caching to the LookupCache.  In
that case, a serviceRemoved() from a registrar would simply flag the
ServiceItemReg as not alive.  The service would not be removed from the
LookupCache, however, until N hours after it was removed from the last

Has anybody else had similar thoughts?  What compromises, extensions
and/or architectures have you chosen as a result? 


View raw message