tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Maas ...@upcd.de>
Subject Possible bug regarding session timeouts in clustered tomcats
Date Tue, 31 Jul 2007 08:25:25 GMT
Hi,

we are using tomcat (6.0.13) in a clustered environment and have noticed
some inconsistent behavior regarding session timeouts.

Or setup is as follows:

Two machines, nodeA and nodeB, each configured identically:
   debian etch
   tomcat 6.0.13
   JDK 1.6.0_02
   apache 2.2.3
   mod_jk 1.2.23


The nodes are configured to replicate sessions via
<Engine ...>
    ...
    <Cluster className="org.apache.catalina.ha.tcp.SimpleTcpCluster"/>
    ...
</Engine>

There is no jvmRoute defined since we can't use sticky sessions because
the nodes are rotated via round-robin DNS.

The replication in itself works as expected, however we noticed that
some sessions took very long to expire or didn't expire at all.

After some digging and debugging we were able to track the problem to
the following cause:

1) User 1 visits the site and hits nodeA
2) Session 1 is created as primary on nodeA with a valid
     maxInactiveInterval ("120" in out testing case)
     It also gets replicated to nodeB but is created without
     maxInactiveInterval (so the default of -1 is used)
3) User gets nodeB for his next request
4) Session 1 becomes primary on nodeB

Now we have the following situation:
Session 1 is valid on both nodes but on nodeB doesn't have a
maxInactiveInterval.
Now one of two things can happen:

1) The user again jumps nodes and goes back to nodeA thereby making the
     session on nodeA primary again.
     In this case everything is good. The session still has a valid
     maxInactiveInterval and is expired once this timeout is hit.
2) The user stays on nodeB (or leaves the site).
     This is where the problem occurs:
     On nodeB the session has no maxInactiveInterval meaning it will never
     expire.
     nodeA OTOH still has a maxInactiveInterval but is not primary for
     this session. So it will expire the session after
     (2*maxInactiveInterval) but only locally because it assumes the
     primary for this session to be dead.
     Now we have a session that is "half" expired between the nodes.

We belive the problem originates in

org.apache.catalina.ha.session.DeltaManager
protected void handleSESSION_CREATED(SessionMessage msg,Member sender)

In line 1409 a new DeltaSession object is created via
createEmptySession() which - of course - inherits the default
maxInactiveInterval (-1) from org.apache.catalina.session.StandardSession.


The attached patch *SHOULD* fix this. (It's untested because I wasn't
able to convince ant to build successfully (for other reasons than this
patch)).


Alex.


Mime
View raw message