geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Blevins <david.blev...@visi.com>
Subject Client/Server Multicast Discovery and Failover
Date Fri, 12 Sep 2008 22:43:15 GMT
I've added some functionality to OpenEJB trunk which has been enabled  
in Geronimo trunk.  Here's an overview of how it works:

DISCOVERY

What we have going on from a tech perspective is each server sends and  
receives a multicast heartbeat.  Each multicast packet contains a  
single URI that advertises a service, its group, and its location.   
Say for example "cluster1:ejb:ejbd://thehost:4201".  We can definitely  
explore the SLP format as Alan suggests.

There are other advantages of the simple, unchanging, URI style.  The  
URI is essentially stateless as there is no "i'm alive" URI or an "i'm  
dead" URI, there is simply a URI for each service a server offers and  
its presence on the network indicates its availability and its absence  
indicates the service is no longer available.  In this way the issues  
with UDP being unordered and unreliable melt away as state is no  
longer a concern and packet sizes are always small.  Complicated  
libraries that ride atop UDP and attempt to offer reliability  
(retransmission) and ordering on UDP can be avoided. UDP/Multicast is  
only used for discovery and from there on out critical information is  
transmitted over TCP/IP which is obviously going to do a better job at  
ensuring reliability and ordering.

On the client side of things, a special "multicast://" URL can be used  
in the InitialContext properties to signify that multicast should be  
used to seed the connection process.  Such as:

    Properties properties = new Properties();
    properties.setProperty(Context.INITIAL_CONTEXT_FACTORY,  
"org.apache.openejb.client.RemoteInitialContextFactory");
    properties.setProperty(Context.PROVIDER_URL, "multicast:// 
239.255.2.3:6142");
    InitialContext remoteContext = new InitialContext(properties);

The URL has optional query parameters such as "schemes" and "group"  
and "timeout" which allow you to zero in on a particular type of  
service of a particular cluster group as well as set how long you are  
willing to wait in the discovery process till finally giving up.  The  
first matching service that it sees "flowing" around on the UDP stream  
is the one it picks and sticks to for that and subsequent requests,  
ensuring UDP is only used when there are no other servers to talk to.

FAILOVER

On each request the server, the client will send the version number  
associated with the list of servers in the cluster it is aware of.   
Initially this version will be zero and the list will be empty.  Only  
when the server sees the client has an old list will the server send  
the updated list.  This is an important distinction as the list  
(ClusterMetaData) is not transmitted back and forth on every request,  
only on change.  If the membership of the cluster is stable there is  
essentially no clustering overhead to the protocol -- 8 byte overhead  
to each request and 1 byte on each response -- so you will *not* see  
an exponential slowdown in response times the more members are added  
to the cluster.  This new list takes affect for all proxies that share  
the same ServerMetaData data.  Internally we key the ClusterMetaData  
by ServerMetaData.  I originally had the version be a simple  
"increment by one" strategy, but eventually went with the value of  
System.currentTimeMillis().  It's possible more than one server is  
reachable via the ServerMetaData (i.e. multicast://) and each server  
has it's own list and version number.  Secondly, if a server is  
restarted, the version number will go back to zero and the client  
could be stuck thinking it has a more current list than the server.

When a server shuts down, more connections are refused, existing  
connections not in mid-request are closed, any remaining connections  
are closed immediately after completion of the request in progress and  
clients can failover gracefully to the next server in the list.  If a  
server crashes requests are retried on the next server in the list.   
This failover pattern is followed until there are no more servers in  
the list at which point the client attempts a final multicast search  
(if it was created with a multicast PROVIDER_URL) before abandoning  
the request and throwing an exception to the caller.  Currently, the  
failover is ordered but could very easily be made random.  The  
multicast discovery aspect of the client adds a nice randomness to the  
selection of the first server that is perhaps somewhat "just".   
Theoretically, servers that are under more load will send out less  
heart beats than servers with no load. This may not happen as theory  
dictates, but certainly as we get more ejb statistic data wired into  
the server functionality we can pursue deliberate heartbeat throttling  
techniques that might make that theory really sing in practice.

GERONIMO

On the G side of things, the multicast functionality has been copied  
into Geronimo.  Still need to get it updated to the latest changes.   
We'll eventually want OpenEJB getting notifications from the Geronimo  
version instead of using it's own.  Once that is done we can remove  
the dep on the openejb-multicast jar.  For the moment I just tucked  
the multicast server implementation into the EjbDaemonGBean as a  
temporary solution.  A tricky thing is that when we get that setup as  
it's own server component we won't want the port offset and the  
hostname to affect the multicast host and port.  The combination of  
the mutlicast host and port essentially creates a "topic" that all  
members of the network can listen to and write messages to.  So any  
servers that are in the same cluster will need to listen on the same  
host/port.

We could really use a GUI for this stuff too.  Is there anyone out  
there with a few spare cycles who wants to write up a trivial little  
"show me the servers on the cluster" kind of portlet for the console?


-David


Mime
View raw message