geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jules Gosnell <ju...@coredevelopers.net>
Subject Re: Web Clustering : Stick Sessions with NO SHARED STORE
Date Sat, 25 Oct 2003 09:10:09 GMT
Vivek Biswas wrote:

> Use of "Internal Network Traffic" for auto-partitioning will give 
> intelligence to
> load balancer to make dynamic decision which otherwise would not been 
> possible.
>     This will make system really flexible and not unnecessarily 
> replicate all
> the session states to all the server but few servers for backup.
>  

Vivek - as I pointed out in my last email - your making some big 
assumptions here about the load balancer - how would you integrate this 
feature with mod_jk or BigIP ?

Partitioning of state is tried and trusted technology - the tricky bit 
is doing it so that you move as little state around as possible and 
integrate with the major players in the load-balancing market.



Jules

> ~Vivek Biswas
>
> */Jules Gosnell <jules@coredevelopers.net>/* wrote:
>
>     interesting - I'll digest it fully over the weekend...
>
>     I spotted a couple of points on the first read though....
>
>     - I don't think that there is very much difference between my and
>     your
>     suggestions for auto-partitioning a cluster - it may well be
>     possible to
>     make this a pluggable strategy, so that different criteria such as
>     'Internal Network traffic' could be used to make decisions.
>
>     - If at all possible, the design should work with existing
>     load-balancers, such as the Cisco and F5 product lines, as well as
>     Open
>     Source offerings such as mod_jk and mod_backhand. People may already
>     have well established architectures that they wish to fit Geronimo
>     into.
>     A Geronimo-aware lb may be able to make optimisations that another lb
>     will not see, but the cluster should still perform reasonably with
>     other
>     lbs.
>
>     - Session migration will be expensive and should be avoided when
>     possible. In cases where it is necessary, e.g. [multiple] node
>     failure
>     may lead to a node carrying more state than it can handle, it should
>     probably progress as a low priority background task, so that the
>     cluster
>     is not suddenly flooded with messages which simply encourage more
>     failures etc... and already stressed nodes are not further pegged
>     out by
>     the cost of [de]marshalling 1000s of sessions - this could quickly
>     lead
>     to a domino-effect bringing down th whole cluster (perhaps we could
>     ringfence 'super-partions' ?).
>
>     Whatever the final design, it needs to be as simple and as robust
>     as it
>     possibly can be. Hybrid solutions, which take advantage of the
>     strengths
>     of different approaches in exchange for a slight increase in
>     complexity
>     are often useful (hence my replication AND shared store approach).
>
>     Whatever replication medium or shared store implementation we use,
>     someone will always come along and want to use something else -
>     JGroups,
>     JMS, DB, JavaSpaces, NFS etc..., so these all need to be pluggable.
>
>
>     I will try to find the time over the weekend to write up my
>     suggestion
>     for auto-partitioning/healing and send it to the list. Then we can
>     compare notes :-)
>
>     Jules
>
>
>     Bhagwat, Hrishikesh wrote:
>
>     >Attached is the PROTOCOL that can help us implement
>     >Clustering without a DB. The PROTOCOL uses the concept
>     >of dynamic partitioning (inspired from Jules Gosnell's
>     >original idea). Also though it is not mentioned explicitly
>     >in the document, many of the functions will operate using
>     >JMX (thanks to Vivek Biswas :-) for enlightening me on this).
>     >
>     >Summary :
>     >The protocol makes the system "dynamically change itself"
>     >from Jetty Like topology where "ALL SERVERS BACKING UP
>     >EACH OTHER" to a simple "PRIMARY - SECONDARY" topology
>     >that contains only one backup per server. Thus as LOAD
>     >increases partitions of ever diminishing sizes are created.
>     >Since back-ups happen only within a partition the system
>     >can scale well (also administrators can specify
>     MAX_SERVERS_IN_CLUSTER
>     >and MIN_SERVERS_IN_CLUSTER).
>     >
>     >The document is divided into
>     >
>     >1. Concept
>     >2. Case of Failover
>     >3. Case of Node Added
>     >4. Case of excessive load on a Server
>     >
>     >while the attachment has topics 1 and 2 , i am still writing the
>     >others. I though that topic-2 was particularly complex so i though
>     >of putting it up on the mailing-list for all to review and
>     >digest it till the relatively simple (i guess) topics 3and4 are ready
>     >
>     >thanks
>     >-hb
>     >
>     >
>     >
>     >
>     >-----Original Message-----
>     >From: Jules Gosnell [mailto:jules@coredevelopers.net]
>     >Sent: Thursday, October 23, 2003 11:55 PM
>     >To: Bhagwat, Hrishikesh; Geronimo Developers List
>     >Cc: Biswas, Vivek
>     >Subject: Re: [Re] Web Clustering : Stick Sessions with Shared Store -
>     >curr ent state of play.
>     >
>     >
>     >I agree on needing to be able to run without a db.
>     >
>     >If you think about it, a db is just a highly specialised node in the
>     >cluster. It is much simpler, and therefore maintainable, if all
>     nodes in
>     >a Geronimo cluster are homogeneous. We can't lose the db your
>     business
>     >data lives in, but if we can avoid adding another for session
>     >replication it might be of advantage.
>     >
>     >To this end, my design should work without a db. You just tune it so
>     >that passivation never occurs - i.e. unlimited active sessions. You
>     >trade of in-vm space against db space.
>     >
>     >Likewise, if you were an advocate of the 'shared store' approach,
>     you
>     >should be able to constrain the web container to keep '0' active
>     >sessions in memory, but passivate everything immediately to the db.
>     >
>     >So, yes, I'm sure we are on the same page.
>     >
>     >
>     >Jules
>     >
>     >
>     >
>     >
>     >Bhagwat, Hrishikesh wrote:
>     >
>     >
>     >
>     >>hi Jules,
>     >>
>     >>I am happy that we are converging, in that, the following
>     approach does very
>     >>well cover some of the main goal that i was trying to achieve
>     when i wrote my
>     >>first proposal. I have marked those points below as [hb-X] with
>     brief comments.
>     >>
>     >>Thus keeping the "Goal of the Architecture" the same I would
>     like to propose a new
>     >>scheme for "replication". While initially, I was keen on having
>     NO DATA EXCHANGE BETWEEN
>     >>SERVERS but rather on having them all just use the shared store
>     to persist and retrieve session
>     >>data, you were interested in the contrary. Your approach
>     (mentioned in section 2 of
>     http://wiki.codehaus.org/geronimo/Architecture/Clustering) is
>     about Session object exchange
>     >>between servers which are clustered and about clusters that are
>     partitioned (stat/dynamically).
>     >>I think the discussion there stops with some issues that you
>     think that may arise with that
>     >>kind of a scheme.
>     >>
>     >>After Vivek Biswas first pointed out to me, I have been feeling
>     increasing uncomfortable with the idea of
>     >>Geronimo being DEPENDENT on an EXTERNAL system like a Database
>     for implementing its web clustering.
>     >>Some ppl on the mailing list have spoken about performanced
>     issues with the DB approach but i
>     >>think with techniques like asny-write-to-DB etc such problems
>     can be circumvented. Thus though I
>     >>still believe that its a solution that can solve the problem, I
>     am not comfortable with the usage of
>     >>an external system to aid in clustering. I dont know of any GOOD
>     (hi-av) DBs that comes for FREE. This
>     >>mean that even if Geronimo is available for FREE it cant be used
>     without purchasing a DB. The solution
>     >>as a whole, is then, not truely FREE-of-cost.
>     >>
>     >>I have been since then looking at many alternatives ... like the
>     Jetty implemented solution of
>     >>having all m/c in one cluster to a 1Primary:1Secondary solution.
>     >>
>     >>Presenly I have something cooking. As I come close to working on
>     details I find it quite similar to
>     >>your original solution of dynamic patitioning. This is what
>     again convinces me that we seem to converge.
>     >>my porposal in to exchange note and to jointly come up with a
>     detailed design.
>     >>
>     >>Do let me know your thoughts on that .... I am tring to complete
>     a document on this new schema ASAP and will
>     >>mail you the same.
>     >>
>     >>thanks
>     >>- hb
>     >>
>     >>
>     >>
>     >>-----Original Message-----
>     >>From: Jules Gosnell [mailto:jules@coredevelopers.net]
>     >>Sent: Thursday, October 23, 2003 3:44 AM
>     >>To: geronimo-dev@incubator.apache.org
>     >>Subject: Re: [Re] Web Clustering : Stick Sessions with Shared
>     Store -
>     >>current state of play.
>     >>
>     >>
>     >>Guys,
>     >>
>     >>since this topic has come up again, I thought this would a
>     useful point
>     >>to braindump my current ideas for comment and as a common point of
>     >>reference...
>     >>
>     >>Here goes :
>     >>
>     >>
>     >>
>     >>Each session has one 'primary' node and n 'replicant' nodes
>     associated
>     >>with it.
>     >>
>     >>Sticky load-balancing is a hard requirement.
>     >>
>     >>Changes to the session may only occur on the primary node.
>     >>
>     >>[hb-1] i always intended only the OWNER NODE (remember ... i
>     wasn't keen on having
>     >>Primary/Sec. concept so i am using this term)to make changes to
>     the SESSION. Also
>     >>only ON CHANGE the session would be sent for persistance.
>     >>
>     >>
>     >>Such changes are then replicated (possibly asynchronously, depending
>     >>on data integrity requirements) to the replicant nodes.
>     >>
>     >>[hb-2] My approach was about NOT USING REPLICATION AT ALL but using
>     >>a shared data store. Though I never mentioned it explicitly (bocz I
>     >>regarded this as a finer detail) ... i always intended on having an
>     >>async sys do the persistence.
>     >>
>     >>If, for any reason, a session is required to 'migrate' to
>     another node
>     >>(fail-over or clusterwide-state-balancing), this 'target' node
>     makes a
>     >>request to the cluster for this session, the current 'source' node
>     >>handshakes and the migration ensues, after which the target node is
>     >>promoted to primary status.
>     >>
>     >>[hb-3] I am not sure how a target server can initiate a
>     migration process
>     >>I would imagine that on a fail-over or on a systematic-removal
>     of a node
>     >>or any other such action that requires a cluster wide
>     state-balancing, it is
>     >>the lb/adminS that would sense this first and initiate actions.
>     >>
>     >>Any inbound request landing on a node that is not primary for the
>     >>required session results in a forward/redirect of the request to
>     it's
>     >>current primary, or a migration of the session to the receiving node
>     >>and it's promotion to primary.
>     >>
>     >>[hb-4] Not sure when this scenario would occur when a
>     "secondary" would
>     >>receieve a HTTP request even when the primary is functionning well.
>     >>
>     >>A shared store is used to passivate sessions that have been inactive
>     >>for a given period, or are surplus to constraints on a node's
>     session
>     >>cache size.
>     >>
>     >>[hb-5] yes this is very much a point that I have been saying
>     since my
>     >>first proposal. Just to quote from that doc. "With a little
>     intelligence
>     >>built in an MS can store away, less busy sessions to DB and
>     retrieve them
>     >>when needed thus offering something that is near to "virtually
>     unlimited
>     >>amount of sessions (section 1.1.1.1)"
>     >>
>     >>Once in the shared store, a session is disassociated from it's
>     primary
>     >>and replicant nodes. Any node in the cluster, receiving a relevant
>     >>request, may load the session, become it's primary and choose
>     >>replicant nodes for it.
>     >>[hb-6] this is a good optimization to sit on the scheme
>     mentioned in [hb-5]
>     >>
>     >>Correct tuning of this feature, in a situation where frequent
>     >>migration is taking place, might cut this dramatically.
>     >>
>     >>
>     >>The reason for the hard node-level session affinity requirement
>     is to
>     >>ensure maximum cache hits in e.g. the business tier. If a web
>     session
>     >>is interacting with cached resources that are not explicitly tied to
>     >>it (and so could be associated with the same replicant nodes), the
>     >>only way to ensure that subsequent uses of this session hit
>     resources
>     >>in these caches is to ensure that these occur on the same node
>     as the
>     >>cache - i.e. the session's primary node.
>     >>
>     >>By only having one node that can write to a session, we remove the
>     >>possibility of concurrent writes occurring on different nodes
>     and the
>     >>subsequent complexity of deciding how to merge them.
>     >>
>     >>[hb-7] I complete agree
>     >>
>     >>The above strategy will work for a 'implicit-affinity' lb (e.g.
>     BigIP),
>     >>which remembers the last node that a session was successfully
>     accessed
>     >>on and rolls this value forward as and when it has to fail-over to a
>     >>new node. We should be able to migrate sessions forward to the next
>     >>node picked by the lb, underneath it, keeping the two in sync.
>     >>
>     >>With an 'explicit-affinity' lb (e.g. mod_jk), where routing info is
>     >>actually encoded into the jsessionid/JSESSIONID value (or maybe an
>     >>auxiliary path param or cookie), it should be possible, in the
>     case of
>     >>fail-over, to choose a (probably) replicant node to promote to
>     primary
>     >>and to stick
>     >>requests falling elsewhere to this new primary by resetting this
>     >>routing info on their jsessionid/JSESSIONID and
>     redirecting/forwarding
>     >>them to it.
>     >>
>     >>If, in the future, we write/enhance an lb to be Geronimo-aware,
>     we can
>     >>be even smarter in the case of fail-over and just ask the cluster to
>     >>choose a (probably) replicant node to promote to primary and
>     then direct
>     >>requests
>     >>directly to this node.
>     >>
>     >>The cluster should dynamically inform the lb about joining/leaving
>     >>nodes, and sessions should likewise maintain their primary/replicant
>     >>lists accordingly.
>     >>
>     >>[hb-8] I complete agree
>     >>
>     >>LBs also need to be kept up to date with the locations and access
>     >>points of the various webapps deployed around the cluster, relevant
>     >>node and webapp stats (on which to base balancing decisions), etc...
>     >>
>     >>All of this information should be available to any member of the
>     >>cluster and a Geronimo-aware lb should be a full cluster member.
>     >>
>     >>On shutting down every node in the cluster all session state should
>     >>end up in the shared store.
>     >>
>     >>
>     >>These are fairly broad brushstrokes, but they have been placed after
>     >>some thought and outline the sort of picture that I would like
>     to see.
>     >>
>     >>Your thoughts ?
>     >>
>     >>
>     >>Jules
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>Bhagwat, Hrishikesh wrote:
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>>>I am also not convinced it reduces the amount of net traffic.
>     After each
>     >>>>request the MS must write to the shared store, which is the
>     same traffic as
>     >>>>a unicast write to another node or a multicast write to the
>     partition
>     >>>>(discounting the processing power needed to receive the message).
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>I agree. However, this is based on the assumption that only one
>     unicast
>     >>>write is required. In other words, this is a primary/secondary
>     topology. I
>     >>>think that hd did not intended such a topology and hence his
>     statement.
>     >>>
>     >>>[hb] Yes i was not assuming a Pri/Sec design but a layout where
>     any active server
>     >>> can be request to pick up a client request which is destined
>     to server that has just failed
>     >>>
>     >>>-----Original Message-----
>     >>>From: gianny DAMOUR [mailto:gianny_damour@hotmail.com]
>     >>>Sent: Sunday, October 19, 2003 7:35 AM
>     >>>To: geronimo-dev@incubator.apache.org
>     >>>Subject: [Re] Web Clustering : Stick Sessions with Shared Store
>     >>>
>     >>>
>     >>>Jeremy Boynes wrote:
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>>However, as Andy says, the cost of storing a serialized object
>     in a BLOB is
>     >>>>significant. Other forms of shared store are available though
>     which may
>     >>>>offer better performance (e.g. a hi-av NFS server).
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>Do we need a shared repository or a replicated repository?
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>>The issue I have with hb's approach is the reliance on an
>     Admin Server, of
>     >>>>which there would need to be at least two and they would need
>     to co-operate
>     >>>>between themselves and with any load-balancers. I think this
>     can be handled
>     >>>>by the regular servers themselves just as efficiently.
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>I agree. It seems that in such a design an Admin Server is only
>     used to
>     >>>route incoming requests to the relevant node.
>     >>>
>     >>>However, I do not believe that regular servers can do this job.
>     I assume
>     >>>that they will implement a standard peer-to-peer cluster
>     topology to provide
>     >>>redundancies, however I do not see how they can handle the
>     dispatch of
>     >>>incoming requests.
>     >>>
>     >>>This feature seems to be either a client or a proxy one: I mean
>     it should be
>     >>>done prior to reach the nodes.
>     >>>
>     >>>For instance, this feature is treated on the client-side via a
>     stub aware of
>     >>>the available nodes in WebLogic. It seems that JBoss (correct
>     me if I am
>     >>>wrong) has also followed this design.
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>>I am also not convinced it reduces the amount of net traffic.
>     After each
>     >>>>request the MS must write to the shared store, which is the
>     same traffic as
>     >>>>a unicast write to another node or a multicast write to the
>     partition
>     >>>>(discounting the processing power needed to receive the message).
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>>
>     >>>I agree. However, this is based on the assumption that only one
>     unicast
>     >>>write is required. In other words, this is a primary/secondary
>     topology. I
>     >>>think that hd did not intended such a topology and hence his
>     statement.
>     >>>
>     >>>Gianny
>     >>>
>     >>>_________________________________________________________________
>     >>>MSN Search, le moteur de recherche qui pense comme vous !
>     >>>http://search.msn.fr/
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>>
>     >>
>     >>
>     >>
>     >>
>     >
>     >
>     >
>     >
>     >------------------------------------------------------------------------
>     >
>     >
>     >Imagine a Cluster having N managed servers providing complete
>     redundancy to each other. Imagine that they are all arranged in a
>     cirlce.
>     >
>     >They continue to be one big cluster till the Internal Network
>     traffic (INT) is less than a certain thereshold. At this stage any
>     server in the cluster is just as good as any other server in the
>     cluster to provide fail-over support. This scheme cannot however
>     scale well largely due to 2 factors
>     >
>     >1. As client requests increase the number of server-to-server
>     exchanges for Session replication would
>     >increase, stressing the network.
>     >2. There would be an added overhead for each server to process
>     ALL of these Sessions objects.
>     >
>     >
>     > A ----------------- B ------------------- C ----------- D
>     > | ----> |
>     > | |
>     > | |
>     > J E
>     > | |
>     > | |
>     > | |
>     > I ----------------- H -------------------- G ----------- F
>     >
>     >
>     >Here is a solution :
>     >-------------------
>     >After the INT crosses a certain threshold value, "A" stops
>     sending Session-updates to "the server before itself" (J). It also
>     sends a notification to J to that effect. J can now forget all the
>     session of A that it had stored up untill now and thus get a chunk
>     of free memory. Each server does the same thus "B cuts off A" and
>     "C cuts off B" and so on. The number of exchanges now reduces from
>     (n * n) to (n * (n-1)). In case of failure "A" still has a solid
>     backup (All server from B to I).
>     >
>     >After somemore increase in the traffic the servers "PERGES" yet
>     another server. Thus this time "A" will stop sending Session
>     Objects to I (and ofcourse J). Follwing that it will send a
>     notification (YOU_ARE_PURGED)to I . On the other hand it "A" would
>     have received a similar notification, this time, from C. However A
>     still has a strong enough backup (all servers from B to H) to fall
>     on. Every machine holds with itself a list of ALL peers (A-B-C- ..
>     in the correct order) and the recent "cutoff-count (no. of servers
>     purged - this value is obviously the same for all the servers)".
>     >
>     >
>     >Note that, in a way partitions are being created in the single
>     large cluster. "A" OWNS a particition that originally contained
>     B-J. It also is a member of 9 other particition owned respectively
>     by B,C...J. As traffic and cliet-load increased A starts pushing
>     out members of its partition thus making it smaller. In effect it
>     gets removed from some other server's (say B .. for cutoff_count =
>     1 and from C when cutover_count = 2) partition.
>     >
>     >As soon as a server is PURGED (goes out of a partition):
>     >
>     >1. It need not any more receieve session-updates from the
>     Partition OWNER - saving network bandwidth
>     >
>     >2. It need not remember any Session objects that were earlier
>     given to it by THAT partition owner - thus allowing it to have
>     more free memory - to be utilized for serving the increased client
>     load.
>     >
>     >
>     >Under this scheme, I think, the Cluster will have a fairly
>     uniform turn around time over a large range of "client load".
>     >
>     >Even as all of this happens the LB will continue allocate NEW
>     CLIENTS in a simple round robin fashion while forwarding requests
>     comming from all other clients, who already have established
>     session with a server, to their respective servers.
>     >
>     >
>     >Now we shall discuss 4 special cases
>     >
>     >1. Failover
>     >
>     >
>     >Now imagine that "A" (actully any randomly selected server)
>     fails. When LB finds out that it is not able to reach "A" it
>     simple forwards the client request (destined for A) to the next
>     available server in the list, "B". B tries to serve the request
>     but doesnt find the SessionID in its list of "NATIVE" sessions.
>     THIS IS A SIGNAL TO "B" that "A"HAS failed. Now it goes on to do
>     the following CRUTIAL ACTIVITY :
>
>     === message truncated ===
>
> ------------------------------------------------------------------------
> Do you Yahoo!?
> The New Yahoo! Shopping 
> <http://shopping.yahoo.com/?__yltc=s%3A150000443%2Cd%3A22708228%2Cslk%3Atext%2Csec%3Amail>

> - with improved product search 



-- 
/*************************************
 * Jules Gosnell
 * Partner
 * Core Developers Network (Europe)
 * http://www.coredevelopers.net
 *************************************/



Mime
View raw message