Return-Path: Delivered-To: apmail-incubator-river-dev-archive@locus.apache.org Received: (qmail 34570 invoked from network); 16 May 2007 09:15:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 16 May 2007 09:15:18 -0000 Received: (qmail 98379 invoked by uid 500); 16 May 2007 09:15:23 -0000 Delivered-To: apmail-incubator-river-dev-archive@incubator.apache.org Received: (qmail 98299 invoked by uid 500); 16 May 2007 09:15:22 -0000 Mailing-List: contact river-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: river-dev@incubator.apache.org Delivered-To: mailing list river-dev@incubator.apache.org Received: (qmail 98238 invoked by uid 99); 16 May 2007 09:15:22 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 May 2007 02:15:22 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [194.109.24.28] (HELO smtp-vbr8.xs4all.nl) (194.109.24.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 May 2007 02:15:15 -0700 Received: from [192.168.1.51] (marbro.xs4all.nl [80.126.48.138]) (authenticated bits=0) by smtp-vbr8.xs4all.nl (8.13.8/8.13.8) with ESMTP id l4G9EpRh091076 for ; Wed, 16 May 2007 11:14:51 +0200 (CEST) (envelope-from mark.brouwer@cheiron.org) Message-ID: <464ACB8B.10302@cheiron.org> Date: Wed, 16 May 2007 11:14:51 +0200 From: Mark Brouwer User-Agent: Thunderbird 2.0.0.0 (Windows/20070326) MIME-Version: 1.0 To: River Dev Subject: Re: JavaSpace.notify() "not reliable" References: <4643E0BC.7090600@wonderly.org> <6D3DB5CA-82DA-4DE0-9DB0-E057BC9ED947@web.de> <464575E1.4040108@dcrdev.demon.co.uk> <46469006.8080303@cytetech.com> <4646CF6D.4050505@dcrdev.demon.co.uk> <4648BF63.5070502@wonderly.org> <4648DDFF.5050805@cheiron.org> <464A0557.5090407@dcrdev.demon.co.uk> <1179260915.11670.54.camel@cameron> In-Reply-To: <1179260915.11670.54.camel@cameron> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: by XS4ALL Virus Scanner X-Virus-Checked: Checked by ClamAV on apache.org Greg Trasuk wrote: > On Tue, 2007-05-15 at 15:09, Dan Creswell wrote: > >> Hmmmmm as yet I'm not clear - what are these NOOP events intended to convey? >> >> Is it a liveness test or simply an indication that probably no events >> have been dropped or something else? >> > > I can help with that: Often in real-time or communications systems the > protocol sends a message (maybe a byte down a serial line, or a packet > over the network) that simply says "I'm here". These are sometimes > called "heartbeat" or "supervisory" packets. The idea is that there is > now a guarantee that some data will be sent within a fixed time > interval. As such, if you go past that time interval, the receiver can > reasonably assume the link has failed somehow and take some reasonable > action (definitions of reasonable can vary widely). > > This can be implemented pretty easily thanks to the Jini Remote Event > Specification; remember that Jini events are designed to be processed by > intermediaries (i.e. RemoteEventListener doesn't actually care what kind > of event it sees). Normally, we think of these intermediaries as > network services (like Mercury), but you can also use the concept in the > local VM to interpose a listener that just counts idle ticks and takes > some fault action when idle ticks exceed a failure threshold. In > Harvester there's a RemoteEventSupervisor class that does just this > (abbreviated source attached below). > > > > I'm not sure about ServiceRegistrar; you could argue either way whether > ServiceDiscoveryManager should be able to tell if the registrars go > away. But for JavaSpaces, I'd say I'd generally want to know that > whatever is generating the entries is alive, rather than just know that > the Space is alive. As such, I'd be more inclined to generate a > supervisory entry (which triggers a notify event) rather than have the > Space generate a supervisory event. Hi Gregg, good response although there might be one point where we differ and that is whether the event producer should (indirectly) be responsible for sending the (what you call) the supervisory event. Assume I have an FX rate service on which I can subscribe to receive the exchange rate between EUR/USD. The exchange rate service itself obtains market rates through Reuters, Bloomberg or any other data provider and will perform some validation over those market rates and might apply client margins to name a few things for which you might want to write such a service. In general that exchange rate is rather volatile (say 5 updates a second) but there are moment where you have one only one update each 3 seconds or when the market is closed you have no updates at all, for some less traded currencies the updates can be in tens of seconds. Well the time the market opens and closes we know in advance, when something terrible happens with the market data provider (if you are lucky) you can find out as well, so assume for these cases we have as part of our event protocol some custom events to signal these cases. The one and only problem I still have is that when I don't receive an event in say 10 seconds I can't make a judgment about whether the FX service has been gone or that volatility for a particular currency pair I'm interested in is not that high. In such a case I would register for events with the constraint that each 10 seconds there is no event it should send me the supervisory event. If I don't receive that event I might try to ping it, or I switch to a backup or do a combination of both, etc, etc. In case there is an intermediary in between the remote events I expect it to store and forward those supervisory events as well. I think (but I can be wrong) in your case where the supervisor lives near the client you can only draw the conclusion that you haven't received an event but you don't know whether that was because of the fact the service crashed/is overloaded/network broken, etc. or for the simple reason there was no update for a particular exchange rate. I don't want to start my fault recovery procedure in the latter case so for that reason I consider the source as the entity that should notify me of its aliveness. Of course all of the above can be developed for each specific event protocol, but in my humble experience that has proven to be a repetitive task which requires less than trivial logic at the event producer side. The watchdog logic at the client side is repetitive as well and a lot of people dismiss remote events as being unreliable while I think we have the means to alter that notion. For that matter I believe the pattern is that common that I consider it a proper (optional) addition to the Jini Distributed Event Model and of course together with the 'inverted' event model. When we 'standardize' this practice we can developed the client side utilities, we can have framework support for the server, but of course you are also allowed to do all the heavy lifting yourself ;-). We can write articles about best practices, etc, etc. Bottom line is that we should be able to create event based solutions for which our friends in the 'you want data, I have data' have to write oh so many lines of error prone code to get the same level of robustness (or information to base decisions upon). As a last note, I think the addition of such a protocol would be particular useful for ServiceRegistrar. Not only would this enable a test for whether callbacks can be received (Jini Distributed Event Protocol) but it also allows you for faster detection whether the state you have for a lookup service based on the events received can be trusted, compared to waiting for a lookup service being discarded. As such if this optional protocol was available I would like to see SDM being modified to take (optionally) advantage of this mechanism. -- Mark