river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Creswell <...@dcrdev.demon.co.uk>
Subject Re: JavaSpace.notify() "not reliable"
Date Wed, 16 May 2007 09:40:18 GMT
Mark Brouwer wrote:
> Greg Trasuk wrote:
> 
>> On Tue, 2007-05-15 at 15:09, Dan Creswell wrote:
>>
>>> Hmmmmm as yet I'm not clear - what are these NOOP events intended to
>>> convey?
>>>
>>> Is it a liveness test or simply an indication that probably no events
>>> have been dropped or something else?
>>>
>>
>> I can help with that:  Often in real-time or communications systems the
>> protocol sends a message (maybe a byte down a serial line, or a packet
>> over the network) that simply says "I'm here".  These are sometimes
>> called "heartbeat" or "supervisory" packets.  The idea is that there is
>> now a guarantee that some data will be sent within a fixed time
>> interval.  As such, if you go past that time interval, the receiver can
>> reasonably assume the link has failed somehow and take some reasonable
>> action (definitions of reasonable can vary widely).
>>
>> This can be implemented pretty easily thanks to the Jini Remote Event
>> Specification; remember that Jini events are designed to be processed by
>> intermediaries (i.e. RemoteEventListener doesn't actually care what kind
>> of event it sees).  Normally, we think of these intermediaries as
>> network services (like Mercury), but you can also use the concept in the
>> local VM to interpose a listener that just counts idle ticks and takes
>> some fault action when idle ticks exceed a failure threshold.  In
>> Harvester there's a RemoteEventSupervisor class that does just this
>> (abbreviated source attached below).
>>
>> <snip>
>>
>> I'm not sure about ServiceRegistrar; you could argue either way whether
>> ServiceDiscoveryManager should be able to tell if the registrars go
>> away.  But for JavaSpaces, I'd say I'd generally want to know that
>> whatever is generating the entries is alive, rather than just know that
>> the Space is alive.  As such, I'd be more inclined to generate a
>> supervisory entry (which triggers a notify event) rather than have the
>> Space generate a supervisory event.
> 
> Hi Gregg, good response although there might be one point where we
> differ and that is whether the event producer should (indirectly) be
> responsible for sending the (what you call) the supervisory event.
> 
> Assume I have an FX rate service on which I can subscribe to receive the
> exchange rate between EUR/USD. The exchange rate service itself obtains
> market rates through Reuters, Bloomberg or any other data provider and
> will perform some validation over those market rates and might apply
> client margins to name a few things for which you might want to write
> such a service.
> 
> In general that exchange rate is rather volatile (say 5 updates a
> second) but there are moment where you have one only one update each 3
> seconds or when the market is closed you have no updates at all, for
> some less traded currencies the updates can be in tens of seconds. Well
> the time the market opens and closes we know in advance, when something
> terrible happens with the market data provider (if you are lucky) you
> can find out as well, so assume for these cases we have as part of our
> event protocol some custom events to signal these cases.
> 
> The one and only problem I still have is that when I don't receive an
> event in say 10 seconds I can't make a judgment about whether the FX
> service has been gone or that volatility for a particular currency pair
> I'm interested in is not that high. In such a case I would register for
> events with the constraint that each 10 seconds there is no event it
> should send me the supervisory event. If I don't receive that event I
> might try to ping it, or I switch to a backup or do a combination of
> both, etc, etc.
> 
> In case there is an intermediary in between the remote events I expect
> it to store and forward those supervisory events as well. I think (but I
> can be wrong) in your case where the supervisor lives near the client
> you can only draw the conclusion that you haven't received an event but
> you don't know whether that was because of the fact the service
> crashed/is overloaded/network broken, etc. or for the simple reason
> there was no update for a particular exchange rate. I don't want to
> start my fault recovery procedure in the latter case so for that reason
> I consider the source as the entity that should notify me of its aliveness.
> 
> Of course all of the above can be developed for each specific event
> protocol, but in my humble experience that has proven to be a repetitive
> task which requires less than trivial logic at the event producer
> side. The watchdog logic at the client side is repetitive as well and a
> lot of people dismiss remote events as being unreliable while I think we
> have the means to alter that notion.
>

Mmmmm, I'm not sure we should alter that notion.  There's more than just
the software at work here.

When we talk about events being unreliable we don't just mean that
services go down etc.  What we're saying is something like:

"You need to figure out what your failure recovery processes are and
design them into your system at human, code and hardware levels".

One thing you might do is the kind of thing you're describing but it's
not the whole picture.

> For that matter I believe the pattern is that common that I consider it
> a proper (optional) addition to the Jini Distributed Event Model and of
> course together with the 'inverted' event model. When we 'standardize'
> this practice we can developed the client side utilities, we can have
> framework support for the server, but of course you are also allowed to
> do all the heavy lifting yourself ;-). We can write articles about best
> practices, etc, etc. Bottom line is that we should be able to create
> event based solutions for which our friends in the 'you want data, I
> have data' have to write oh so many lines of error prone code to get the
> same level of robustness (or information to base decisions upon).
>

No issue there but I'm not clear on just how deeply baked in this
support needs to be.

> As a last note, I think the addition of such a protocol would be
> particular useful for ServiceRegistrar. Not only would this enable a
> test for whether callbacks can be received (Jini Distributed Event

Could I not test the ability to do callbacks with something like:

(1)	Register a notify for a special test proxy I'm about to publish
temporarily.

(2)	Register my test proxy.

(3)	See if I get an event.

(4)	If I got an event I'm all done otherwise I'll try a few more times.

etc.

> Protocol) but it also allows you for faster detection whether the state
> you have for a lookup service based on the events received can be
> trusted, compared to waiting for a lookup service being discarded. As
> such if this optional protocol was available I would like to see SDM
> being modified to take (optionally) advantage of this mechanism.


Mime
View raw message