Return-Path: Delivered-To: apmail-geronimo-dev-archive@www.apache.org Received: (qmail 71067 invoked from network); 19 Oct 2005 13:23:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 19 Oct 2005 13:23:17 -0000 Received: (qmail 43134 invoked by uid 500); 19 Oct 2005 13:22:58 -0000 Delivered-To: apmail-geronimo-dev-archive@geronimo.apache.org Received: (qmail 43010 invoked by uid 500); 19 Oct 2005 13:22:58 -0000 Mailing-List: contact dev-help@geronimo.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: dev@geronimo.apache.org List-Id: Delivered-To: mailing list dev@geronimo.apache.org Received: (qmail 42957 invoked by uid 99); 19 Oct 2005 13:22:57 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Oct 2005 06:22:57 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [209.235.255.182] (HELO jetty3.inetu.net) (209.235.255.182) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Oct 2005 06:22:57 -0700 Received: (qmail 23537 invoked from network); 19 Oct 2005 13:22:34 -0000 Received: from host86-137-166-42.range86-137.btcentralplus.com (HELO ?192.168.0.4?) (jules@86.137.166.42) by jetty3.inetu.net with AES256-SHA encrypted SMTP; 19 Oct 2005 13:22:34 -0000 Message-ID: <43564875.8020700@coredevelopers.net> Date: Wed, 19 Oct 2005 14:21:57 +0100 From: Jules Gosnell User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929) X-Accept-Language: en-us, en MIME-Version: 1.0 To: dev@geronimo.apache.org CC: dev@wadi.codehaus.org Subject: Re: Clustering - JGroups issues and others References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Valeri.Atamaniouk@nokia.com wrote: >Hello > >My concern regarding the clustering is that the mechanism itself is more >general, then the session replication only. (Un-)fortunately HTTP(Web) >is not the only interface in the cluster. From our perspective all >clustered facilities should base on the same mechanism in the solution, >as otherwise the behaviour of the system is hardly predictable. So, if >we base some application/service distribution model based on assumption >that sub-partitioning is possible, we may end up with interesting (from >technical perspective) problems regarding multiple services instead of a >single one reporting controversal values. > > I see two points here ; 1) That whatever solution that we come up with to deal with fragmentation, be applicable to as many different areas of Geronimo clustering as is possible. I agree wholeheartedly, Whilst the ideas I threw out were for WADI - and, by extension, any form of session management (i.e. OpenEJB etc), they were not intended to imply that this was the only problem. It is just that problems involving clustered state are often the most difficult to deal with in terms of scalability and availability. We need to divide up the areas of functionality that we are compiling a list of and decide how each one should respond to cluster fragmentation and what approaches can be shared. 2) You have illustrated the fragmentation issue with a particular usecase - singleton services. - That is my reading of your example - I hope I haven't misunderstood. I'm not sure that I actually see the problem with singleton services in this case, but I guess it depends on how they are elected. I would expect all fragments that find themselves running without a required service would elect one node to perform it. As each fragment merged and realised that it had two instances of the same singleton service, one of those instances would be de-elected. By the time the whole cluster had reformed, only one instance of the service would remain. Having said all of this, I would be much more in favour of an architecture which did not use singleton services at all. They represent a single point of failure and contention. If there is a way to partition the service, or run a number of instances, I think that this would be preferable. Ideally, I would like to see it partitioned to the point that every node carried a piece of the service and could be self-sufficient if it suddenly became isolated from the others. The architecture behind WADI's distributed hash table works like this. A node should only allocate session ids which map to buckets/partitions of which it is the owner, thus a session may be born, live and die on a single node (but be available to all) without that node having to talk to any other node (except for replication traffic - but the session need not be replicated in order to be distributable/migratable to other nodes). >In our case the fragmentation of the cluster would lead to the fact, >that all fragments will try to reboot all other fragments using >management interfaces :) A true nightmare... > > Sounds very nasty :-) >Besides, it is much easier to maintain/predict the cluster behaviour >when the node considered active only when it can reliably reach certain >(central) cluster network service. This is probably different from >traditional approach, but from our perspective it is better to loose all >the service, then to get something unpredictable. The reason is that in >both cases it is reported as a system outage, but in the second one it >is much more difficult to detect/analyse/fix. > > Agreed - and perhaps this could be one form of pluggable membership tracking strategy, that sat in the clustering substrate. This would mean that in the case of fragmentation, only those nodes remaining in the same fragment as the 'master' node would continue normally. All the others, on losing contact with this node, would decide that they had fallen out of the cluster and seek to reestablish a connection - hopefully refusing to service any requests (and therefore maintaining the consistancy of the clustered service) until they had rejoined the surviving fragment. As you have mentioned, you would have to make absolutely sure of the availability of this 'master' node, otherwise you would lose your whole cluster. With a model like this, we could describe your architecture, the jgroups architecture and a number of other possibilities, whilst the issue of membership remains abstracted away from the clustered services themselves... How does that sound ? Jules >-valeri > > > > >>-----Original Message----- >>From: ext Jules Gosnell [mailto:jules@coredevelopers.net] >>Sent: 19 October, 2005 13:51 >>To: dev@geronimo.apache.org; dev@wadi.codehaus.org >>Subject: Re: Clustering - JGroups issues and others >> >>Thanks for coming back, Valeri. >> >>You have put your finger fairly and squarely on the cluster >>implementer's nightmare :-) >> >>This really is a thorny problem which I keep coming back to. >>I'm assuming that if the cluster becomes fragmented into >>different subgroups (that map to h/w enclosures etc.) and that >>if they can all still see common backend servies, but not >>other peer groups, then the e.g. h/w load-balancer in a >>web-deployment may still be able to see all nodes in all >>groups ? Since traffic is still arriving at more than one >>cluster fragment, all sorts of problems may arise. >> >>I guess WADI might do something like this : >> >>The cluster fragments... >> >>Each fragment would find that it had an incomplete set of >>buckets/partitions (WADI's architecture is to partition the >>session space into a fixed number of buckets and share >>responsibility for these between the cluster members). >> >>Each fragment would have to assume that the missing partitions >>had been lost and would not be rejoining (in case this were >>really the case), so the missing partitions would have to be >>resurrected and repopulated with sessions drawn from >>replicated copies. Thus each fragment would end up with a >>complete set of partitions. >> >>Each fragment would be likely to end up with an incomplete >>session set that intersected with the session set held by >>other fragments (since it is likely that not all sessions >>could be resurrected, and some would be resurrected within >>more than one fragment). >> >>Assuming (and I think we would have to make this a hard >>requirement) that the load-balancer supported session affinity >>correctly, requests would continue to be directed to the node >>holding the original (not >>resurrected) version of their session. >> >>So, at this point, we have survived the fragmentation and we >>are still fully available to our clients, although there may >>have been quite a lag whilst partitions were >>rebuilt/repopulated and the footprint of each node has >>probably increased due to each fragment carrying a larger >>proportion of the original cluster's sessions than it was >>originally (the session sets intersect). >> >>Then, the network comes back :-) >> >>Each fragment would become aware of the other fragments. >>Multiple copies of partitions and sessions would now exist >>within the same cluster. >> >>Multiple instances of the same partition can be merged by >>simply taking the union of the session sets that they manage. >> >>Merging multiple instances of the same session is a bit more >>awkward. if sessions carried some sort of version >>(HttpSessions carry a LastAccessedTime field), then all >>instances with the same 'version' can be collapsed. I guess we >>then move on to a pluggable strategy of some sort. The >>simplest of these would probably just assume that only one >>session would have been involved in a dialogue with the client >>since the fracture, since the client was 'stuck' to its node. >>If this is the case, then sessions with the lower version will >>all be snaphots of the original session taken at the point of >>fracture and will not have diverged further and so may be >>safely discarded (we may be able to try to remember/deduce the >>time of fracture and discard any session with a LAT before >>that point), leaving the original session only to continue. >>If divergance has occurred, then some custom, application >>space code might be run that can use application-level >>knowledge to merge the various session versions. But I think >>that if we have got to this stage, then we are in real trouble >>and should probably just declare an error and drop the session. >> >>None of this is yet implemented in WADI, but it is stuff that >>I dream/have-nightmares about when I get too geeky :-) I hope >>to put some of this fn-ality in at some point. >> >> >>What sort of frequency might this type of scenrio occur with ? >>It will be a lot of work to protect against it, but I realise >>that a truly enterprise-level solution must be able to survive >>this sort of thing. >> >>If anyone else has had thoughts about surviving cluster >>fragmentation, I would be delighted to hear them. >> >> >> >>Jules >> >> >> >> -- "Open Source is a self-assembling organism. You dangle a piece of string into a super-saturated solution and a whole operating-system crystallises out around it." /********************************** * Jules Gosnell * Partner * Core Developers Network (Europe) * * www.coredevelopers.net * * Open Source Training & Support. **********************************/