tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kharp...@oreillyauto.com
Subject Re: Tuning session replication on clusters
Date Fri, 07 Sep 2012 16:19:04 GMT
Chris:
>Assembling the sessions into a Collection is likely to be very fast,
>since it's just copying references around: the size of the individual
>sessions should not matter. Of course, pushing all those bytes to the
>other servers...

>Perhaps Tomcat does something like serialize the session to a big
>binary structure and then sends that (which sounds insane -- streaming
>binary data is how that should be done -- but I haven't checked to
>code to be sure).

It appears that tomcat is serializing all the data into a singular
structure, rather than a collection of references.  Watching VisualVM plot
heap  usage during replication, it nearly doubles (in my test env, this was
the only thing running so that makes sense).  If you're sure Tomcat is only
making references, then I'd propose there is a problem with the JVM
dereferencing the collection elements and double-counting the memory used.
Either way, it's enough to make the JVM report a doubling of heap usage and
a raise to the heap allocation.  As soon as replication is done, heap use
goes back to normal.  I've attached a screenshot to the zip file.


Now for data:
I did tests of 200 sessions (~20 MB) at a time (200, 400, 600... up to
3000).  I then tested in groups of 1000 (3000, 4000, 5000... up to 10k).
At no point did I receive any exceptions or OOME issues.  Heap usage never
climbed above 60% Xmx.  My lab was isolated to help give consistent
results.  Here are some points.

1.  There is a pivotal point where replication performance degrades
dramatically.  In my tests, this happened around 2400-2600 sessions.  I
restarted tomcat and was able to avoid the issue, until I hit 2800 sessions
(~300 MB total session data).  There was a 153% jump in time required to
perform replication at this point.  From there, each subsequent test took
marginally longer per session (15-25%) than the test before it.  Chris was
correct, it's not exponential, but the ms/session gets worse and worse as
we climb.  I have no explanation for the sharp jump or the continued
degradation as we climb.  I've seem similar performance issues with sort
and comparative logic, but those don't make sense here.  Perhaps this
serialized object is being jerked around Young Gen/Old Gen and having to be
constantly reallocated?  Grasping at straws here...

2.  Networking is a large portion of the bottleneck for large data sets.
The thread size and pool size attributes to the sender/receiver had no
impact on throughput.  Also, a packet capture revealed nothing naughty
happening.  However, the rxBufSize and txBufSize values on the Nio receiver
and the PooledParallel transport elements made a profound difference.  I
generated 7000 sessions (~700MB) and used default settings:  74 sec.
Increasing the rx/tx settings by x5 I was able to replicate the sessions in
33 sec.  Gains beyond x5 were almost nil; at x100 (which is absurd) only
resulted in 29.3 sec replication.
A simple SCP transfer of a 700 MB file (using tmpfs folders) between these
same two systems took 13 seconds.

My conclusion is that tuning the network was obviously a great help, but it
still took 30 seconds to replicate 700MB worth of session data on a network
with enough throughput to perform the transfer in 13 seconds.  I don't know
if further network settings could be changed for the DeltaManager to aid in
speeding up replication, but given the spike in memory use and the pivotal
performance drop at a consistent point I'm inclined to think we're hitting
some edge case regarding session size and memory settings (Xmx/Heap and
NewSize/SurvivorRatio).  As Chris said, if Tomcat isn't collecting just
references, it probably should be.

Feel free to pick apart my data or thoughts.  I tried to be as analytical
as possible, but there's a lot of conjecture in here.

Attachment
(See attached file: SessionResearch.zip)
If the list strips it, find copy here:
https://docs.google.com/open?id=0B876X8DOwh8peEkyZVd6RVc4cWc

Thanks.

Kyle Harper








This communication and any attachments are confidential, protected by Communications Privacy
Act 18 USCS  2510, solely for the use of the intended recipient, and may contain legally
privileged material. If you are not the intended recipient, please return or destroy it immediately.
Thank you.

Mime
View raw message