river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Trasuk <tras...@stratuscom.com>
Subject Re: River-examples project - followup
Date Mon, 06 Apr 2015 19:30:01 GMT

Hi Dan:

Thanks for the great feedback.  

I’m pretty sure you already know this, Dan, since you’re a long-time Jini user, but let
me explain for the newer folks and the archives.  This is a case where what you’re seeing
is the expected behaviour.  When the service registers itself with Reggie, it takes out a
lease on the registration. That lease is usually renewed periodically by the service’s JoinManager
(that isn’t quite the whole story, but it’ll do for now).  When you kill the service unexpectedly
with ctrl-c, the service doesn’t de-register itself, however the lease eventually runs out
(now that it’s not being renewed by the service) and then the registration expires, allowing
Reggie to reclaim its resources and notify any registrar listeners. 

It would be possible to register a vm shutdown hook to de-register the service before the
vm exits, but in this case I think it’s actually better to leave it out, since it demonstrates
nicely that a dead  service (or at least a dead JoinManager) eventually gets dropped from
the registrar.

You said the duplicate service instances “worked”, in that you can show info and browse
the service, but of course, you’re really just looking at the information that’s in the
registry - the registrar and service browser don’t actually contact the service.  Reggie
has no knowledge of the “liveness” of the service, and doesn’t attempt to do any “health
check”.  

In fact, it’s a common misconception that if the service renews the lease, it must be “live”.
 This turns out to be false for many reasons.  (1) The service could have delegated its lease
renewals to a different service.  (2) There’s no guarantee that failure of the actual service
thread would also cause failure of the lease renewal thread, even if they are in the same
process (embedded programmers might recognize this as being similar to the “resetting the
watchdog in a timer-triggered interrupt service routine” problem).  (3) Even if there were
a health check task, the service could fail in the instant just after the health check.  The
most a health check, monitor or heartbeat can do is place a limit on how long it takes to
find out a service has failed.  The only way to say with certainty that a service “works”
is to attempt to use it.

The lease is purely for the convenience of the registrar (or generically, the service granting
the lease).  If ever the lease is not renewed, the landlord can go ahead and reclaim whatever
resources were dedicated to the lease.  In the case of Reggie, if the lease isn’t renewed,
Reggie drops the registration.  So there’s little risk of “stuck registrations”.  And
since the lease can be renewed, there’s no need for any kind of extended default timeout.

So, I think I’ll put most of the above explanation into the tutorial, unless anyone has
other thoughts.

Cheers,

Greg Trasuk

On Apr 6, 2015, at 1:42 PM, Dan Rollo <danrollo@gmail.com> wrote:

> Hi Greg,
> 
> I finally took some time to try this out. It really looks great to me!
> 
> I noticed one minor thing that I thought might confuse users: While going through tutorial
steps, I decided to stop (via cntrl+c) are restart the hello-service a couple times. This
resulted in the service being shown multiple times in the service browser (screenshot attached).
It appeared all the duplicate instances in the browser “worked” (I could “show info”
and “browse service” on all of them). Eventually, the duplicate registrations “cleaned
up” and I was left with just one. I’m not sure how best to avoid confusion about this
situation. Would more doc about “why”/“how” that works just complicate things? Is
there any sort of “force lease check” to do in the browser that could clear up the duplicates
sooner? (And if so, would that be worth noting in the tutorial?). So basically, not sure this
is a “problem”, but thought I’d ask…
> 
> Thanks!
> Dan
> 
> <revier-examples-RepeatedService.png>


Mime
View raw message