tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Mikusa <>
Subject Re: problem with clustering
Date Thu, 04 Apr 2013 13:01:02 GMT
On Apr 4, 2013, at 6:43 AM, Andy Pahne wrote:

> An application that has been running fine for years now suddenly does perform with varying
results, sometimes as quick as always, but then sometimes a simple page request uses up to
30 seconds.

If you haven't changed anything with the application or your Tomcat configuration, then you'll
want to look at the external resources that your application depends upon, such as a database,
the network, shared file systems, etc…  If the performance of an external resource is suffering,
it could definitely be causing problems for your application.

> Since the performance did degrade we regularly find log items like the following one
in catalina.out (many of them, about 100 to 300 per hour on each host):
> 04.04.2013 11:51:53
> INFO: Verification complete. Member still alive[org.apache.catalina.tribes.membership.MemberImpl[tcp://{-64,
-88, 6, 21}:4000,{-64, -88, 6, 21},4000, alive=1706334,id={-99 120 -58 21 -84 121 74 45 -104
-73 -123 -40 10 -76 70 59 }, payload={}, command={}, domain={}, ]]

I think that you'll typically see these when there is a network issue, but you would see them
anytime a member is timed out.

The connections between the nodes in your cluster are monitored with a heartbeat.  When a
node doesn't respond to the heartbeat the node is considered to have left the cluster.  To
protect against false positives you can configure a TcpFailureDetector.  This listens for
"memberDisappeared" events and when one occurs, it will connect to the member via TCP to try
to confirm it's disappearance.  

In your case, the message that you are seeing is indicating that the heartbeat failed, but
that the TcpFailureDetector was able to verify the node still exists.  In other words, this
is a false positive.

In addition to the TcpFailureDetector, you can also adjust the "frequency" and "dropTime"
attributes to control how often heartbeats are sent and how long to wait for the response.
 You might try adjusting these settings to make the configuration more tolerant of your network.

> We ruled out that the recent changes to said application are the cause for the poor performance
y simulating all sorts of heavy load on various test systems. It just works nicely in the
test environment. However, on production it does not.
> We are using the SimpleTcpCluster solution for clustering on Tomcat 6. The cluster has
two nodes.

It would be helpful to post your configuration, minus comments, as well as the exact version
of Tomcat that you are running.

> I am NOT suspecting a tomcat bug. And as I said I am not suspecting a performance bottleneck
in our application or in the db queries it performs. At the moment I am thinking of a hardware
failure of some kind (network interface, router etc.).
> Do you have any experience with this problem and what did you do to resolve it?

If you suspect a network issue, you could try monitoring with Wireshark or tcpdump to capture
the network packets.  Analysis of the packets could show if there is a problem.  Another option
would be to try and use a tool like iperf to put a high load on your network and possibly
trigger the problem.


> Thanks,
> Andy
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message