river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Brouwer <mark.brou...@cheiron.org>
Subject Re: JavaSpace.notify() "not reliable"
Date Wed, 16 May 2007 09:14:51 GMT
Greg Trasuk wrote:

> On Tue, 2007-05-15 at 15:09, Dan Creswell wrote:
>
>> Hmmmmm as yet I'm not clear - what are these NOOP events intended to convey?
>>
>> Is it a liveness test or simply an indication that probably no events
>> have been dropped or something else?
>>
> 
> I can help with that:  Often in real-time or communications systems the
> protocol sends a message (maybe a byte down a serial line, or a packet
> over the network) that simply says "I'm here".  These are sometimes
> called "heartbeat" or "supervisory" packets.  The idea is that there is
> now a guarantee that some data will be sent within a fixed time
> interval.  As such, if you go past that time interval, the receiver can
> reasonably assume the link has failed somehow and take some reasonable
> action (definitions of reasonable can vary widely).
> 
> This can be implemented pretty easily thanks to the Jini Remote Event
> Specification; remember that Jini events are designed to be processed by
> intermediaries (i.e. RemoteEventListener doesn't actually care what kind
> of event it sees).  Normally, we think of these intermediaries as
> network services (like Mercury), but you can also use the concept in the
> local VM to interpose a listener that just counts idle ticks and takes
> some fault action when idle ticks exceed a failure threshold.  In
> Harvester there's a RemoteEventSupervisor class that does just this
> (abbreviated source attached below).
 >
 > <snip>
 >
> I'm not sure about ServiceRegistrar; you could argue either way whether
> ServiceDiscoveryManager should be able to tell if the registrars go
> away.  But for JavaSpaces, I'd say I'd generally want to know that
> whatever is generating the entries is alive, rather than just know that
> the Space is alive.  As such, I'd be more inclined to generate a
> supervisory entry (which triggers a notify event) rather than have the
> Space generate a supervisory event.

Hi Gregg, good response although there might be one point where we
differ and that is whether the event producer should (indirectly) be
responsible for sending the (what you call) the supervisory event.

Assume I have an FX rate service on which I can subscribe to receive the
exchange rate between EUR/USD. The exchange rate service itself obtains
market rates through Reuters, Bloomberg or any other data provider and
will perform some validation over those market rates and might apply
client margins to name a few things for which you might want to write
such a service.

In general that exchange rate is rather volatile (say 5 updates a
second) but there are moment where you have one only one update each 3
seconds or when the market is closed you have no updates at all, for
some less traded currencies the updates can be in tens of seconds. Well
the time the market opens and closes we know in advance, when something
terrible happens with the market data provider (if you are lucky) you
can find out as well, so assume for these cases we have as part of our
event protocol some custom events to signal these cases.

The one and only problem I still have is that when I don't receive an
event in say 10 seconds I can't make a judgment about whether the FX
service has been gone or that volatility for a particular currency pair
I'm interested in is not that high. In such a case I would register for
events with the constraint that each 10 seconds there is no event it
should send me the supervisory event. If I don't receive that event I
might try to ping it, or I switch to a backup or do a combination of
both, etc, etc.

In case there is an intermediary in between the remote events I expect
it to store and forward those supervisory events as well. I think (but I
can be wrong) in your case where the supervisor lives near the client
you can only draw the conclusion that you haven't received an event but
you don't know whether that was because of the fact the service
crashed/is overloaded/network broken, etc. or for the simple reason
there was no update for a particular exchange rate. I don't want to
start my fault recovery procedure in the latter case so for that reason
I consider the source as the entity that should notify me of its aliveness.

Of course all of the above can be developed for each specific event
protocol, but in my humble experience that has proven to be a repetitive
task which requires less than trivial logic at the event producer
side. The watchdog logic at the client side is repetitive as well and a
lot of people dismiss remote events as being unreliable while I think we
have the means to alter that notion.

For that matter I believe the pattern is that common that I consider it
a proper (optional) addition to the Jini Distributed Event Model and of
course together with the 'inverted' event model. When we 'standardize'
this practice we can developed the client side utilities, we can have
framework support for the server, but of course you are also allowed to
do all the heavy lifting yourself ;-). We can write articles about best
practices, etc, etc. Bottom line is that we should be able to create
event based solutions for which our friends in the 'you want data, I
have data' have to write oh so many lines of error prone code to get the
same level of robustness (or information to base decisions upon).

As a last note, I think the addition of such a protocol would be
particular useful for ServiceRegistrar. Not only would this enable a
test for whether callbacks can be received (Jini Distributed Event
Protocol) but it also allows you for faster detection whether the state
you have for a lookup service based on the events received can be
trusted, compared to waiting for a lookup service being discarded. As
such if this optional protocol was available I would like to see SDM
being modified to take (optionally) advantage of this mechanism.
-- 
Mark

Mime
View raw message