tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mitch Claborn <>
Subject Re: Cluster session replication performance
Date Mon, 10 Sep 2018 16:33:26 GMT
Further information and questions.

I created my own interceptor based on ThroughputInterceptor so that I 
could log the timing of specific sessions to correlate them with the 
failures in my health check program.  I was surprised to find that in 
those instances where the health check reported a failure, the 
interceptor reported that the session send was accomplished in < 5 ms, 
while the health check app is waiting a full 1000 ms between calls to 
the different tomcat instances. So now I'm more confused than ever.

Anyone have any ideas?

In a ChannelInterceptor, does when getNext().sendMessage(destination, 
msg, payload) returns, does that mean that the message has been sent AND 
received by the recipient member, or does that only indicate a send?


On 09/06/2018 01:53 PM, Mitch Claborn wrote:
> I'm using a cluster with the DeltaManager between two servers on Tomcat 
> 9.0.11. I've set channelSendOptions="8" (asynchronous session replication).
> I have a "health check" app that I run periodically, one of the 
> functions being to check that sessions are being replicated properly. 
> That app
> 1) Does a GET to tomcat A, calling a Struts action that creates a 
> session and stores a known value in it
> 2) Waits 2 seconds
> 3) Uses the session ID cookie from step 1 and makes a call to tomcat B, 
> to an action that retrieves that value from the session
> 4) Compares the two values from the session to make sure that they are 
> the same.
> Most of the time this check works fine, but occasionally the call to the 
> second server will find that the session does not exist on that server, 
> presumably because it has not yet replicated there yet. 2 seconds seems 
> a long time for a session to replicate, especially one as small as this 
> one is. If I decrease the amount of wait time at step 2, the failure 
> rate increases.
> I turned on the ThroughputInterceptor and have the following observations.
> - Server A has a transmit throughput around 10 MB/sec while B has only 
> around 3 MB/sec. This might be accounted for by the fact that B was the 
> last server to start, so A would have (I think) transmitted all of the 
> sessions at once when B started up, so it might get good throughput from 
> the big send??
> Questions:
> 1. IS 2 seconds a long time to replicate a session?
> 2. Other than actual network slowness, are there internal issues that 
> could cause the replication to be slow?
> 3. If so, is there anyway to diagnose those?
> 4. I'm thinking about writing my own version of ThroughputInterceptor 
> that will give more information on specific messages and timings. Has 
> anyone tried that? In that interceptor can I access the session ID? That 
> would help me correlate timings between my failure reports and the 
> interceptor.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message