geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ikarzali <>
Subject Re: Effectiveness of WADI's Design and Implementation Comforted
Date Fri, 19 Oct 2007 01:08:50 GMT

Jules Gosnell-2 wrote:
> So, WADI offers a delta-based replication route as well - but this was 
> not the one under test.
I am not sure what the goal is here.  is WADI trying to compete directly
with Terracotta?  I don't think so.  Terracotta is general-purpose
clustering and can share almost any POJO (without the replay concept),
preserves object identity, and clusters thread coordination as well.  Sounds
like WADI field-level is dependent on an HTTPSession lifecycle.  Not that
such an idea is bad, just different.  Field-level replication for Terracotta
is different that a conceptual "replay" in that it can push only the deltas
to complex fields and object hierarchies, anywhere in the hierarchy.  It
differs in that it doesn't present a risk like "replay" does that there
might be side-effects outside update of the session attributes when  running
methods more than once.  And, Terracotta, has thread coordination so
consistency and coordination across JVMs is possible just like across
threads.  Now, for webapps, there should be no threads, but the container is
still multi-threaded and race conditions can easily occur if concurrency is
not properly isolated.

I go into more detail below as to how our approach to field-level
replication differs from "replay."

Jules Gosnell-2 wrote:
> Ari,
> I know nothing about Terracotta, so would you mind spending a little 
> more time comparing and contrasting architectures...
> I agree that the best case scenario is fully functional stickiness - but 
>   then you would simply be testing session replication - see Gianny's 
> "Session Replication Test" explanation and results.

I think I failed to explain Terracotta that well in my previous email.  So I
will try with more detail (apologies for the rambling email at this point). 
We have a library that sits inside your JVM allowing us access to your reads
and writes to heap.  We then have a central server cluster who's job it is
to keep all the JVMs in sync.  We keep track of which JVM has what
references resident and active...and then our server only pushes field-level
deltas and only to JVMs that need the deltas.  This is not a session
replication product.  It is cross-JVM POJO coordination at the heap-level...

In a session context, this means many things (not good or bad, just reality
of Terracotta):

1. If any node needs a session it goes to look it up from the
SessionManager.  Terracotta is inside the SessionManager looking for
heap-reads and when a session'ed request comes in (a cookie already set in
the browser and passed in the request), if that session is not local
already, the SessionManager nor the JVM never realize this.  Terracotta
pulls the appropriate session object (without pulling all its attributes)
into the JVM that needs it.  In effect, any session is available ANYWHERE at
any time without ALL sessions being everwhere all the time.  What's more,
attributes will now fault in lazily as attributes are requested and
references traversed.  Thus Jeff G's assertion that TC can handle large
session objects (I know of one TC user with 15 meg session graphs).

2. Since Terracotta is heap-level in nature, and it knows what objects are
where, it has the ability to push session updates made in one JVM to other
JVMs.  If sessions are only resident in a single JVM, Terracotta will only
store the change in our own server just in case of JVM-level failure but TC
will not push the change to any other node. If session has been balanced to
more than one server, those JVMs are automatically connected in a
conversation about the aforementioned session.  Its neither Peer-to-Peer nor
client-server.  It is, instead, locality of reference: trying to keep
objects (sessions in this case) consistent where they are needed and
simultaneously keep them from leaking to where they are not.

3. Since Terracotta is heap-level in nature, it gets a big performance boost
on read.  In WADI's case, this might be true as well but Terracotta allows
the SessionManager to trust its own session map and all reads that are for
sticky session data come from local heap w/o a network call of any sort. 
Some clustering solutions can't allow this at scale and in a round-robin
scenario.  They will store session in a single node w/ n-replicants but
every node that is not the home node of that session must read/write to the
home node and cannot cache the session data.

This information in theory explains some of the test results:
1. round-robin puts the session in every JVM so the TC cluster is
maintaining consistency and WADI might be async or may be going to the same
node for session storage?  Note that round-robin is not how TC is tuned with
regard to session.  If you want to cluster and share every object on all
nodes, TC tends to outrun most solutions in such highly contentious write
scenarios.  But webapps should not be configured such that session is
necessary across requests and yet you do round-robin such that nodes are
constantly faulting in session data in order to handle the request.

2. Your Terracotta server is on the same machine as your test cluster and,
thus, is being throttled by the test itself.  Once TC is throttled, the app
above it becomes throttled simply because TC is blocking heap-level reads
and writes while it fights for CPU to push updates around.  So, if you run
TC on the same laptop as your test, our latency comes mostly from the CPU
context switching and not from our software.  Its not a valid test.  Its
like running Oracle on the same node as your web app and then running a
request that maxes out CPU due to your container chewing on something and in
the middle, asking Oracle to serve up a query.  Oracle's latency would then
be measured to be HUGE, when in fact Oracle on another box may return that
answer sub-millisecond if it had enough RAM to keep some data in cache, etc. 
Starvation is just that...

Jules Gosnell-2 wrote:
> !! - I would hope that Terracotta doesn't always do one-to-all 
> replication and should point out that my reading of "One again, the cost 
> of keeping one session replica, i.e. the cost of keeping a copy of a 
> session on an another node than the one currently owning the session, 
> remains constant while the number of nodes in the cluster increases for 
> WADI." (see the Session Migration & Replication Test" observations), 
> indicates that WADI is running in a 1+1 configuration, that means it IS 
> replicating - each session has 1 (configurable) backup session off-node. 
> and therefore no SPoF either.
> and a different test will more accurately reflect the relative
>> performance of the systems.
> This is an interesting thread - but I think that there is a lot of 
> misunderstanding/miscommunication occurring that it would be useful to 
> clear up :-)
> Jules

I must have misread that WADI had replication disabled since sesions had a
home in the test and any replication would be 2-stage replication versus
Terracotta's 1-stage.  My misunderstanding.

I am not being defensive here.  After all, I know we have a feature on the
roadmap to make round-robin work fast.  Such a feature would basically cause
your JVM running your web container to forget it saw a session right after
it responded to a request.  Since TC can always give a particular JVM a
session reference just in time on heap-read, why not drop the session and
avoid the broadcast storm, right?  We don't have that feature at the moment,
however, so any test of us at round robin will work but might not be fast. 
Sticky with a TC server on its own box is the way to go.

I want to make sure everyone who wants to know what Terracotta is good at
and how fast it can go is aware of all the content at  I recommend reading it before passing judgment
based on these graphs.  Also, I don't know if I extended the offer but I
would love to get on a Skype or concall with anyone who wants to dig in more
on the two approaches.  I am happy to help WADI learn from what we do and
vice versa.  I think there is a lot of value in sharing.

Kewl stuff you guys are working on.  Keep it up!


BTW, I wasn't suggesting that anyone was cheating.  I meant that the test is
an unfair test based on assumptions about the product.  No harm no foul.

View this message in context:
Sent from the Apache Geronimo - Dev mailing list archive at

View raw message