tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mitch Claborn <>
Subject Cluster session replication performance
Date Thu, 06 Sep 2018 18:53:54 GMT
I'm using a cluster with the DeltaManager between two servers on Tomcat 
9.0.11. I've set channelSendOptions="8" (asynchronous session replication).

I have a "health check" app that I run periodically, one of the 
functions being to check that sessions are being replicated properly. 
That app
1) Does a GET to tomcat A, calling a Struts action that creates a 
session and stores a known value in it
2) Waits 2 seconds
3) Uses the session ID cookie from step 1 and makes a call to tomcat B, 
to an action that retrieves that value from the session
4) Compares the two values from the session to make sure that they are 
the same.

Most of the time this check works fine, but occasionally the call to the 
second server will find that the session does not exist on that server, 
presumably because it has not yet replicated there yet. 2 seconds seems 
a long time for a session to replicate, especially one as small as this 
one is. If I decrease the amount of wait time at step 2, the failure 
rate increases.

I turned on the ThroughputInterceptor and have the following observations.
- Server A has a transmit throughput around 10 MB/sec while B has only 
around 3 MB/sec. This might be accounted for by the fact that B was the 
last server to start, so A would have (I think) transmitted all of the 
sessions at once when B started up, so it might get good throughput from 
the big send??

1. IS 2 seconds a long time to replicate a session?
2. Other than actual network slowness, are there internal issues that 
could cause the replication to be slow?
3. If so, is there anyway to diagnose those?
4. I'm thinking about writing my own version of ThroughputInterceptor 
that will give more information on specific messages and timings. Has 
anyone tried that? In that interceptor can I access the session ID? That 
would help me correlate timings between my failure reports and the 



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message