river-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Creswell <...@dcrdev.demon.co.uk>
Subject Re: SourceAliveRemoteEvent Part II
Date Fri, 01 Jun 2007 10:11:40 GMT
So I'm still confused about what the exact use cases here so I'll guess:

One seems related to firewall traversal etc and whether or not callbacks
can performed.

A second appears to be attempting to divine whether or not you've lost
some events.

A third appears to be attempting to divine the health of a remote service.
Mark Brouwer wrote:
> Hi Dan,
> Dan Creswell wrote:
>> Hi all,
>> It started with a discussion under the Javaspaces.notify() not reliable
>> conversation and I've now had a bit more time to formulate my thoughts.
>> Without this extra feature we do something like the following in the
>> client:
>> (1)    Setup a watchdog timer with a suitable expiry
>> (2)    On receiving a remote event, reset our watchdog timer
>> (3)    If timer expires, check to see if our source is still alive, check
>> to see if we might've missed an event.
>> What's being proposed, if I understand correctly is the the source if
>> it's alive and hasn't generated events in a particular time period
>> confirm that by posting a SourceAliveRemoteEvent to the client
>> confirming this.
> The idea has 3 aspects:
> 1) the SourceAliveRemoteEvent (SARE) protocol is triggered by a
>    QoS invocation constraints set upon registration;
> 2) the source must send a SARE as the first event (this is helpful in
>    finding out whether callbacks are possible);
> 3) the source should send a SARE in case a certain time after the last
>    remote event sent has elapsed.
> Below I will try to clarify why I consider this having advantages over
> performing a ping.
>> This would potentially change the above client code to reset the timer
>> on just a SourceAliveRemoteEvent (SARE).
>> Things of note:
>> (1)    The original solution places the responsibility and load on the
>> client (bar the pinging of the server).  This naturally scales out quite
>> well as the server only has to respond to pings and chances are a client
>> only maintains timers for a few services.  If client timeouts are tuned
>> appropriately to event frequency/typical pause, pings will be rare.
> The SARE protocol is 'triggered' based on a QoS invocation constraint,
> i.e. only clients that have interest in SAREs will register for
> receiving them with their event registration. A server won't be sending
> SAREs for those who have shown no interest, also the constraints can be
> rejected in case the timeout period requested would be too small and
> the server wants to refuse, i.e. the server has a say in the 'tuning'.
> Preventing a client to invoke ping because it sets a very small time-out
> seems to be much harder to control.
> I must say it really depends on what the ping constitutes before I would
> be able to say ping is a trivial operation for the server.
>> (2)    The new solution places much of the responsibility with the
>> server.
>>  I believe there may be a scaling problem here.  In contrast to the
>> client-side approach a server might have a large number of clients to
>> cope with.  This potentially means the server has significant load
>> tracking a large number of timer events for all it's clients and posting
>> SARE's in addition to what it already does.
> No denial the proposal brings additional complexity to those services
> that wish to support the constraint.
> I've been implementing SARE in Seven last week and I have it working,
> the event framework became more complex although due to experience in
> building a few of these similar mechanisms at the application layer I
> was able to make some optimizations in the code that gives me the
> impression the overhead is quite minimal assuming a time-out is used
> that relates to the average expected event rate.
> Therefore I'm not that afraid of scalability issues given the fact the
> time-out period is expected to be in line with and probably larger then
> the event rate at which you will be sending events. Or in other words,
> the time-out is likely only small in case you expect a high remote event
> frequency, meaning SAREs won't be sent that often. If they do your
> server is likely capable of dealing with large number of events anyway.
> And on the positive side one must find a proper usage for all these
> multi-core/CMT CPUs coming our way.
>> (3)    The only difference between old and new approach from a client
>> coding perspective is what causes a reset of the watchdog timer.
> For a client is seems to me SARE is easier than performing a remote
> method invocation (ping) that might take some time to return. I expect
> with SARE none to a minimal amount of ordinary remote method invocations
> (ping) to take place so for clients it is less likely to take additional
> roundtrip time (and the possibility of timing out) of these calls into
> account (the calls are exceptions and not the norm). In the ping case
> your timer probably will hand of to something that will perform the ping
> asynchronously to prevent from interfering with the timer itself.

Yes my timer will indeed hand off but most of what's needed is already
in the JDK.  About all I need to do is write the logic to allow a
programmer to pass the RemoteEvent stream through the watchdog and
provide some kind of callback to invoke if the stream appears to have
been interrupted.

And for the client-side approach as per your solution, pinging or some
other client action will only be triggered in the case where no remote
events arrived in the programmer's defined time period such that the
watchdog fired.

The key difference is my client might make a decision to switch
erroneously however I don't see SARE solving that problem because it's
nigh on impossible to guarentee the event will arrive at all and/or on time.

At this stage I'm trying to fathom how generating an additional event
which can be lost/not-delivered in timely fashion is of much use in
dealing with an environment that loses events in general.

> When your watchdog goes of with SARE you know some QoS criteria hasn't
> been met by your source versus go figure out whether it did send events
> which haven't arrived. In many cases with SARE you won't perform a
> request to your source, you might go straight for a backup service and
> ignore the service altogether, or you ring the alarm bell of some
> Network Operations Center. But of course there will be cases you want to
> be a bit more persistent about your event registration.
>> (4)    SARE's like any other event can be lost - if it's lost the client
>> watchdog will trigger just as it would in the old approach given
>> sufficient time between RemoteEvents.
> Indeed it is possible a SARE will be lost. Although for most type
> of services I've coded (no multiple hops and no event payload provided
> by "I mess up the codebase clients") the chance a SARE will be forever
> lost due to a transitory failure I consider small compared to the other
> expected failures.
>> (5)    If the source has sent events but they've been lost it won't
>> send an
>> SARE and, again client watchdog will timeout and ping.
>> Based on the above it seems to me that whilst an SARE might save a few
>> pings there's additional complexity and greater server load.  If I've
>> missed some subtleties, please shout because right now I don't see
>> enough benefit in this to justify the "pain".
> So far I'm not sure in the above what you exactly mean with a 'ping'. Is
> it just a way to check whether the service is alive or do you envision
> more, something that has a correlation with the event registration and
> internal event framework and that can say meaningful things about its
> ability to deliver event. If it is only something to check whether the
> service is alive/reachable I consider SARE a much richer concept for
> getting info about the ability to deliver events, also because it
> follows the exact route of event delivery. Ping doesn't represent the
> invocation path in case of Jini Distributed Events which (especially in
> the case of security and network topology) might be failing just because
> of these differences.
> In the proposal I also use SARE as the first event to be sent to
> verify whether a callback is possible, so besides a 'source alive' it
> also serves another purpose, namely to find out whether event delivery
> can work at all.
> One thing we haven't covered yet is that a ping for reachability might
> be successful, even while the source is not able to deliver events
> timely due to being overloaded, deadlocks, etc., while SARE will show
> the source is not able to deliver events properly. As such it tells
> me more about the state the event producer is in and its ability to
> serve me.

I agree ping can be misleading but I don't see how SARE really helps -
the absence of a SARE arriving leaves you wondering what exactly
happened.  Did you get overloaded, did you lose an event, did the server
fail?  It seems to me SARE is a hint just as the results of a ping are a
hint just as the timeout due to lack of arriving events that drives the
need to ping is a hint.

> To conclude, a ping (assuming in its simple and generic form) doesn't
> give me enough information about the capabilities of a source to deliver
> events, where SARE can do this better. Yes it will lead to complexity at
> the server, maybe a slight reduction in scalability, but in most cases a
> simplification at the client side and the ability to get indications you
> won't be able to get with ping.
> I'm not saying this is the only way, but to me it represents a pattern I
> have often used and see value in being part of the standard toolbox, but
> so might mechanisms to test for reachability/availability (the ones
> Dennis mentioned).

All understood - it's precisely why I'm asking you the questions - to
determine what might be best (which includes subjective measures of
simplicity, reusability, bang for buck etc)

> My hope is that the common patterns people use can be
> standardized/formalized so that we see more support for them, either
> through frameworks, utilities or whatever people like to see or fit to
> them. But at least in a way they don't stay proprietary in many small
> corners of the Jini empire.

No issue with that!


View raw message