From Rainer Jung <>
Subject Re: Tuning session replication on clusters
Date Thu, 06 Sep 2012 14:57:59 GMT
On 06.09.2012 15:10, wrote:
> ...  This actually didn't surprise me after I
> discovered how large the sessions were.  Using JMX (VisualVM) I watched the
> Heap size on my two servers as I tested 7000 sessions.  Heap climbed
> approximately 1GB.  When I restarted node2, I watched node1's heap usage
> nearly double.
> This confirmed my suspicion that the replication process is putting a copy
> of all sessions into a new object (list I suppose?) before transmitting
> them.  After replication finished (109 seconds), node1's heap usage went
> back to normal.

That's a plausible explanation for your observation. You can split 
replication in several chunks using the config items you already 
observed. Even in TC 6 the DeltaManager supports:

     sendAllSessions (Default: true, means all session send in one 
message, false means split in multiple messages)
     sendAllSessionsSize (Default: 1000, number of sessions send per 
message when switch is false)
     sendAllSessionsWaitTime (Default: 2000; sleep pause between sending 
consecutive messages)

> The aggregation of sessions into a new object to be sent (I presume as part
> of the handleGET_ALL_SESSIONS?) seems to work quickly, though I'm not sure
> how to test how much of the 109 seconds it took to replicate was Tomcat
> gathering up all the sessions to send and how much was network traffic.  We
> have a low utilization gigabit ethernet fabric connecting all servers, so
> transferring 1GB of data shouldn't take more than 10-12 seconds.
> Does anyone know if there are ways to time the different steps in the
> replication process?

Set log level of org.apache.catalina.ha.session.DeltaManager to DEBUG or 
FINE depending whether you are using log4j or juli for Tomcat.

>  If it is the network send/receive process that's
> slow,

Try sniffing both ends for network analysis.

> are there transmit/receive settings for the sender/receiver that
> could aid in speeding up replication of large data chunks?  I see there are
> rxBufSize and txBufSize settings on the Receiver and Transport elements,
> and they're set to 25/43kb.  If those values represents how data is chunked
> then larger settings might help (similar to the throughput difference of
> transferring 100x 10MB files vs. 10,000x 100kb files on a network).



