geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jules Gosnell <>
Subject Re: Clustering - JGroups issues and others
Date Tue, 18 Oct 2005 14:41:13 GMT wrote:

>Here is my 5 cents... I have some comments regarding clustering based on
>J-Groups. We were trying to use this technology and came to certain
>points, that render it unusable in our case.
>Many of the cluster caches/replicates assume that all the information
>propagated to all the nodes in the cluster. Some of the solutions
>propagate only keys, however. In any case this solution can not be used
>in sufficiently large clusters as the rate of upates would eat all the
>node capacity making it unusable.
This is the dreaded 1->all replication that is a popular implementation 
at the moment. See my previous mail about wadi's avoidance of this 
giving it a significant advantage over such solutions, in terms of 

>Regarding J-Groups itself. Probably that is specific to cluster
>facilities in JBoss, but generally J-Groups organize a list of nodes,
>and every node checks the state of the next one in the chain.
I wasn't sure how it worked... interesting ...
We should look into how membership is tracked by ActiveCluster.

> The
>problem is that in many cases servers may fail/disconnect in groups,
>which causes two problems: the segmentation of the cluster 
cluster segmentation is a really tricky issue :-( - do all the segments 
then try to arrange themselves into smaller clusters, shifting loads of 
state around, or is jgroups smart enough to put all the pieces back 
together before passing control back to the application ?

>extremelly high failure report time, as for architectures based on blade
>technology servers shut down in large packs
do these 'packs' correspond to racks ? I have plans (NYI) for pluggable 
algorithms that will allow WADI to choose e.g. nodes in other racks, on 
other power sources, in other buildings etc as replication partners, 
otherwise you will lose state in a situation like this, if you happen to 
have yours backed up on to the node next to you in the same rack...

> and it really takes time to
>detect several sequentally disconnected servers.
What sort of lag are we talking about - a few seconds, or a few tens of 
seconds ?

>To overcome the problems we ended up with the "star" architecture, where
>the central node is responsible for maintaining the list of other nodes.
>The availability of the central node itself could be provided with
>facilities like Red Hat Cluster Suite or similar (service failover,
>floating IPs, etc).
Hmmm.. - I understand why you went for this architecture, but I would 
prefer to find one that is homogeneous - i.e. we don't need a special, 
non-standard configuration for the central node. Deployment is much 
easier if every node has the configuration. Still, this is good input 
and has got me thinking in a direction which I had not really considered 

Thanks, Valeri,



"Open Source is a self-assembling organism. You dangle a piece of
string into a super-saturated solution and a whole operating-system
crystallises out around it."

 * Jules Gosnell
 * Partner
 * Core Developers Network (Europe)
 * Open Source Training & Support.

View raw message