axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rajith Attapattu" <>
Subject Re: [Axis2] Adding ClusterManager code the the codebase
Date Tue, 13 Feb 2007 00:07:58 GMT
Hi Sanjaya,

Sorry for not replying sooner.
Comments inline as usual marked with [RA].



On 2/8/07, Sanjaya Karunasena <> wrote:
> Hi Rujith,
> Please fine my comments inline.
> Regards,
> Sanjaya
> On Thursday 08 February 2007 09:10, Rajith Attapattu wrote:
> > Hey Sanjaya,
> >
> > It is indeed turning out to be a good conversation.
> > comments inline.
> >
> > Regards,
> >
> > Rajith
> >
> > On 2/7/07, Sanjaya Karunasena <> wrote:
> > > [SK] So why not use Synapse? Of course there is an option of embedding
> a
> >
> > simple but a fast load balancer as the default load balancer. It is
> always
> > good to have different configurations available for different
> requirements
> > when it comes to application development. Only thing required is some
> good
> > documentation.
> > [RA] Syanpse is certainly an option
> >
> > > [SK] Certainly, starting with small steps is always important and
> work.
> >
> > But let's have the discussion going so that we keep an eye on the final
> > goal while doing that.
> > [RA] Totally agree.
> >
> > > Message ordering provided by the communication framework to work, it
> >
> > should be notified on all the dependant events. However there is a cost
> > associate with this. The question is where do you invest? Which approach
> > handle concurrency with the least cost?
> >
> > > Let me explain how total ordering going to work. In total ordering
> > > message
> >
> > m+1 is delivered to all the recipients before message m is delivered.
> When
> > event > execution is synchronous, the event will be automatically
> blocked
> > until the message is delivered.
> >
> > >  This way if a write happen at time t and if a read start concurrently
> at
> >
> > time t+1 the event will be automatically blocked until the write is
> > delivered to all the recipients. Which event occurred first (happen
> before)
> > can be determined using  the Lamport algorithm.
> >
> > [RA] If we block for reads and to be sure that nobody is writing to it
> > while we are reading it then we need to wait till we have the "virtual
> > token", since no node can write until they aquire the token. This will
> be
> > very slow. Isn't it ?
> > This maybe acceptable for writes, but for obvious performance reasons,
> we
> > will have to live with dirty reads.
> > Also to block the service from reading or writing cannot be done w/o
> > modifications/impact to the kernal, which is going to be shot down for
> sure
> >
> > :)
> >
> > I am already getting a beating for performance :)
> [SK] I can see in the mailing list :-). But if you carefully analyze it,
> with
> locking you do the same without having any reliability guarantees plus the
> additional burden of having to tackle distributed locking.
> However, IMO whether to live with dirty reads or not should be the choice
> of
> application developer. We have no right to lock them in to some thing.
> Obviously then they will look for some other solution where they have more
> flexibility.
> Like I earlier stated, performance is important but if you want
> scalability,
> reliability, etc you have to give in some of it. You can't eat your cake
> and
> have it too. :-)
> But yes, when there is no clustering, the new code added should not
> degrade
> performance.
> What we need to understand is, a reliable group communication framework
> allows
> us to make some assumption on reliability, message ordering etc which
> helps
> us to simplify the algorithms in the upper layers.

[RA] that was exactly my point. Keep the upper levels as simple as possible
and allow the infrastrucutre to do the magic.
That is why I was opposed to block reads and updates at the axis2 level to
enforce total data integrity. If the underlying implementation can do some
magic then great, but lets keep the core axis2 code simple.

So there is a saving for
> the initial cost we inccur. Certainly, we should evaluate the return on
> investment.

[RA] I agree that we shouldn't impose a locking strategy or anything on the
end user and we need to provide a choice.

Currently (as per the disscussions we had on this subject) there are a
couple of decesions we are allowing an end user. These options can be
combined to create an acceptable solution. See if these are acceptable to

1) Choice of replication/group communication framework.
   If we implement the ClusterManager with several group communication
frameworks (Richocet, Tribes, Evs4J ....etc) then the end user can choose a
stratergy that best fits there need. All these frameworks has different
level of garuntees forreliability, scalability, performance ..etc.

2) Choice of replication strategy.
  a) Container managed. - We have predetermined replication points to
replicate state. Currently this is at the end of an invocation.
  b) Service managed - The service author decides the replication points and
the frequency in which it is called.

In a sticky session use case (a) would be acceptable as the same service
wan't be accessed concurrently in two nodes (hopefully :).
Also in an active passive use case (a) is the most reasonable choice.

If "business logic" is executed aysnchronously or if the invocation is long
running or if there are no sticky sessions then (b) would be a safe bet.
The service author can call updateState or flush when ever he/she thinks
it's appropriate. It could be after every property change or some other
criteria. This coupled with a group communication framework that implements
total ordering would ensure the desired result (or atleast something close)

> > >  A relaxed approach is to use causal ordering of messages, if the
> causal
> >
> > order of events can be determined. There, events for which the order
> cannot
> > be determined, is treated as independent and does not enforce any
> ordering.
> >
> > [RA] The paper on TOTEM claims same performance as casual ordering or
> even
> > FIFO delivery. But not sure how accurate that claim is.
> >
> > > Sounds very expensive ha....  :-) But if you really look at it,
> locking
> >
> > techniques essentially does the same with giving you the additional
> > overhead of tackling distributed deadlocks.
> > [RA] Well the research paper says so :)
> >  This approach is good if we replicate attributes as and when a change
> > occurs. But if a service does too many writes during a invocation it
> will
> > be a big performance issue and increase network chatter considerably. If
> > they update the same variable several times during an operation it would
> be
> > waste of resources.
> >   If we replicate at the end of an invocation the chances of conflicts
> go
> > up. In such a case, distributable locking maybe a more viable solution.
> >
> [SK] The multicasting techniques used by group communication frameworks
> are
> not like IP multicasting. Messages only get delivered to the group. An
> algorithm which tackle distributed locking need to also do some thing very
> similar. I couldn't find any research work on this. Well, this could be
> the
> one :-). But you are right, this is not going to work if we replicate
> things
> at the end of an invocation. At the same time we need to evaluate the
> question raised by Chamikara.

[RA] Yes, so WADI seems to be the only infrastrucutre that does locking.
I am not for or against distributed locking or total ordering. I am against
about making those decesions at the axis2 level (refer to my previous
emails). I think the strategy should come from the replication framework we

> > [SK] OK I think I got your point. But then it nullify the ability make
> > > use
> >
> > of the real power provided by the underneath messaging infrastructure.
> It
> > will be only used as a multicasting channel and we have to come up with
> > techniques to tackle every thing else.
> > [RA] Not sure I understand you here (as to how it nullifies the ability
> to
> > leverage ...). Can you explain this a bit more ?
> [SK] As you may have already read, following are some of the attractive
> properties of a Reliable Group Communication environment.
> * Virtual Synchrony
> * Reliable multicast with Message Ordering
> * Group membership services
> Due to many layers they have implemented to tackle these, moment you
> employ
> one of these, you are absorbing a cost. So, then we need to seek to get
> the
> maximum out of it. But let me read bit more on some of the work which is
> lready done on this area.
JBoss clustering implementation may be worth
> looking at. Following two papers are also worth looking at.

[RA] As I said I am not against  total ordering :)
Tribes is not IP multicasting either. It's a group membership communication
environment built on peer-to-peer com.
I personally don't know if IP multicasting or peer-to-peer is best. (I have
heard all types of arguments on this topic)

JGroups doesn't implement virtual synchrony either. It does have an
experimental version on totem which not production quality.
Besdies the license is LGPL.

Richochet does not have group membership and it is based on IP multicasting
(Chamikara correct me if I am wrong).

WADI is built on top of Tribes (or other group coms) and provides
distributed locking.

I am interested on all these technologies and is not for or against any.
Lets experiment with them as time permits and let the end user choose what
they want depending on their situation.

> > >[SK] Have you checked Appia and stuffed developed at Cornell? As I told
> > > you
> >
> > we may get away with causal ordering too.
> > [RA] We talked with Prof Ken Birman and looked at Ricochet Thats what
> they
> > recommended us.
> > The problem with Ricochet is that it doesn't have membership. But it
> does
> > have some interesting guarantees about performance especially when the
> no
> > of nodes go up. But this was a year ago. I am thinking about restarting
> the
> > discussion. They may have added membership to Ricochet.
> > I am actually interested in doing another clustering impl with Ricochet.
> > (now that we have some ground work in place)
> >
> [SK] Do you mean the membership service?

[RA] Yes.

Anyway, I am talking about a different way of doing these stuff. So this
> certainly need some research investment.

[RA] What did u mean by different way? I am sorry I think I didn't
understand the context clearly.

> Regards,
> >
> > Rajith.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message