geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Blevins <>
Subject Re: Client/Server Multicast Discovery and Failover
Date Mon, 29 Sep 2008 21:46:37 GMT
Checked in a small test app that allows this stuff to be taken for a  

Hopefully we can use it as a tool to start getting some feedback.   
Some of the parts of it are getting reworked, but should be runnable  


On Sep 12, 2008, at 5:43 PM, David Blevins wrote:

> I've added some functionality to OpenEJB trunk which has been  
> enabled in Geronimo trunk.  Here's an overview of how it works:
> What we have going on from a tech perspective is each server sends  
> and receives a multicast heartbeat.  Each multicast packet contains  
> a single URI that advertises a service, its group, and its  
> location.  Say for example "cluster1:ejb:ejbd://thehost:4201".  We  
> can definitely explore the SLP format as Alan suggests.
> There are other advantages of the simple, unchanging, URI style.   
> The URI is essentially stateless as there is no "i'm alive" URI or  
> an "i'm dead" URI, there is simply a URI for each service a server  
> offers and its presence on the network indicates its availability  
> and its absence indicates the service is no longer available.  In  
> this way the issues with UDP being unordered and unreliable melt  
> away as state is no longer a concern and packet sizes are always  
> small.  Complicated libraries that ride atop UDP and attempt to  
> offer reliability (retransmission) and ordering on UDP can be  
> avoided. UDP/Multicast is only used for discovery and from there on  
> out critical information is transmitted over TCP/IP which is  
> obviously going to do a better job at ensuring reliability and  
> ordering.
> On the client side of things, a special "multicast://" URL can be  
> used in the InitialContext properties to signify that multicast  
> should be used to seed the connection process.  Such as:
>   Properties properties = new Properties();
>   properties.setProperty(Context.INITIAL_CONTEXT_FACTORY,  
> "org.apache.openejb.client.RemoteInitialContextFactory");
>   properties.setProperty(Context.PROVIDER_URL, "multicast:// 
>   InitialContext remoteContext = new InitialContext(properties);
> The URL has optional query parameters such as "schemes" and "group"  
> and "timeout" which allow you to zero in on a particular type of  
> service of a particular cluster group as well as set how long you  
> are willing to wait in the discovery process till finally giving  
> up.  The first matching service that it sees "flowing" around on the  
> UDP stream is the one it picks and sticks to for that and subsequent  
> requests, ensuring UDP is only used when there are no other servers  
> to talk to.
> On each request the server, the client will send the version number  
> associated with the list of servers in the cluster it is aware of.   
> Initially this version will be zero and the list will be empty.   
> Only when the server sees the client has an old list will the server  
> send the updated list.  This is an important distinction as the list  
> (ClusterMetaData) is not transmitted back and forth on every  
> request, only on change.  If the membership of the cluster is stable  
> there is essentially no clustering overhead to the protocol -- 8  
> byte overhead to each request and 1 byte on each response -- so you  
> will *not* see an exponential slowdown in response times the more  
> members are added to the cluster.  This new list takes affect for  
> all proxies that share the same ServerMetaData data.  Internally we  
> key the ClusterMetaData by ServerMetaData.  I originally had the  
> version be a simple "increment by one" strategy, but eventually went  
> with the value of System.currentTimeMillis().  It's possible more  
> than one server is reachable via the ServerMetaData (i.e.  
> multicast://) and each server has it's own list and version number.   
> Secondly, if a server is restarted, the version number will go back  
> to zero and the client could be stuck thinking it has a more current  
> list than the server.
> When a server shuts down, more connections are refused, existing  
> connections not in mid-request are closed, any remaining connections  
> are closed immediately after completion of the request in progress  
> and clients can failover gracefully to the next server in the list.   
> If a server crashes requests are retried on the next server in the  
> list.  This failover pattern is followed until there are no more  
> servers in the list at which point the client attempts a final  
> multicast search (if it was created with a multicast PROVIDER_URL)  
> before abandoning the request and throwing an exception to the  
> caller.  Currently, the failover is ordered but could very easily be  
> made random.  The multicast discovery aspect of the client adds a  
> nice randomness to the selection of the first server that is perhaps  
> somewhat "just".  Theoretically, servers that are under more load  
> will send out less heart beats than servers with no load. This may  
> not happen as theory dictates, but certainly as we get more ejb  
> statistic data wired into the server functionality we can pursue  
> deliberate heartbeat throttling techniques that might make that  
> theory really sing in practice.
> On the G side of things, the multicast functionality has been copied  
> into Geronimo.  Still need to get it updated to the latest changes.   
> We'll eventually want OpenEJB getting notifications from the  
> Geronimo version instead of using it's own.  Once that is done we  
> can remove the dep on the openejb-multicast jar.  For the moment I  
> just tucked the multicast server implementation into the  
> EjbDaemonGBean as a temporary solution.  A tricky thing is that when  
> we get that setup as it's own server component we won't want the  
> port offset and the hostname to affect the multicast host and port.   
> The combination of the mutlicast host and port essentially creates a  
> "topic" that all members of the network can listen to and write  
> messages to.  So any servers that are in the same cluster will need  
> to listen on the same host/port.
> We could really use a GUI for this stuff too.  Is there anyone out  
> there with a few spare cycles who wants to write up a trivial little  
> "show me the servers on the cluster" kind of portlet for the console?
> -David

View raw message