Return-Path: Delivered-To: apmail-tomcat-users-archive@www.apache.org Received: (qmail 93446 invoked from network); 1 Apr 2008 10:48:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Apr 2008 10:48:54 -0000 Received: (qmail 53875 invoked by uid 500); 1 Apr 2008 10:48:40 -0000 Delivered-To: apmail-tomcat-users-archive@tomcat.apache.org Received: (qmail 53851 invoked by uid 500); 1 Apr 2008 10:48:40 -0000 Mailing-List: contact users-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Users List" Delivered-To: mailing list users@tomcat.apache.org Received: (qmail 53840 invoked by uid 99); 1 Apr 2008 10:48:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Apr 2008 03:48:40 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [194.109.157.34] (HELO servers.realworks.nl) (194.109.157.34) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Apr 2008 10:47:49 +0000 Received: from rwlinux2.colo.realworks.nl (rwlinux2.colo.realworks.nl [10.0.10.52]) by rwlinux32.colo.realworks.nl (Postfix) with ESMTP id 224B2FED5; Tue, 1 Apr 2008 12:47:58 +0200 (CEST) Received: from rwlinux2.colo.realworks.nl (localhost [127.0.0.1]) by rwlinux2.colo.realworks.nl (Postfix) with ESMTP id 01B9C77546; Tue, 1 Apr 2008 12:47:58 +0200 (CEST) Date: Tue, 1 Apr 2008 12:47:58 +0200 (CEST) From: Ronald Klop To: Tomcat Users List Cc: David Rees Message-ID: <30796653.86341207046878004.JavaMail.tomcat@localhost> In-Reply-To: <72dbd3150803311213y2896821h70b06fe974c8bee9@mail.gmail.com> References: <72dbd3150803300149p5566a69drf72c342f7610349@mail.gmail.com> <72dbd3150803300214i2a2ffd67m375e2c4c534639b0@mail.gmail.com> <72dbd3150803301707i444b58b8n9b1e6e973218a85b@mail.gmail.com> <21058284.48401206959892221.JavaMail.tomcat@localhost> <72dbd3150803311213y2896821h70b06fe974c8bee9@mail.gmail.com> Subject: Re: Cluster Memory Leak - ClusterData and LinkObject classes MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_39755_9027262.1207046878002" X-Mailer: BaseNet/WebMail (2.3.0-PREVIEW-225-73955) X-Originating-Host: from theo-en-jan.xs4all.nl ([82.92.161.253]) by rwlinux2.colo.realworks.nl ([10.0.10.52]) with HTTP (Apache Tomcat/5.5.26); Tue, 01 Apr 2008 12:47:58 +0200 X-Priority: 3 (Normal) Importance: Normal X-Originating-User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.8.1.12) Gecko/20080229 Firefox/2.0.0.12 X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_39755_9027262.1207046878002 Content-Type: multipart/alternative; boundary="----=_Part_39754_3407492.1207046877949" ------=_Part_39754_3407492.1207046877949 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit On Mon Mar 31 21:13:25 CEST 2008 Tomcat Users List wrote: > On Mon, Mar 31, 2008 at 3:38 AM, Ronald Klop wrote: > > > > See my previous mail about send/receive buffers filling because Ack wasn't > > read by FastAsyncSender. > > The option waitForAck="true" did the trick for me. But for FastAsyncSender > > you should set sendAck="false" on the receiving side. > > Thanks for the information, Ronald. Can you clarify your settings by > by posting a minimal configuration? I looked for the option sendAck on > the Tomcat cluster page and couldn't find any reference to that > configuration parameter: > http://tomcat.apache.org/tomcat-5.5-doc/cluster-howto.html > > It looks like doing something like one of the two is a good idea for a > barebones setup to make sure that the acking behavior is consistent > since Tomcat doesn't seem to ensure that they are sane: > > receiver.sendAck="true" sender.waitForAck="true"/> > > receiver.sendAck="false" sender.waitForAck="false"/> > > I'm a bit confused as to why this issue only affects one of my > clusters (out of 3 production clusters with identical setups) and not > more people are seeing it. Are most people specifying their Ack > settings? Or do most people not see enough traffic between restarts to > trigger this issue? Granted, the one that's affected also happens to > handle the most traffic by far. I'll have to do more testing on my > test cluster to verify (I've already turned on waitForAck everywhere > in production), hopefully I can reproduce it. > > Anyone have information on how using Acks in the cluster affects performance? > > -Dave > Hello Dave, I attached my server.xml file. I hope the mailinglist doesn't filter it. I think a lot of people don't send enough sessions between the nodes to see this or they use sticky sessions. The problem is in the Acks not being read by Tomcat. The network receive buffer fills. Only after the receive buffer is full the send buffer on the other node (send acks) is filling. And only after that send buffer is full the node stops sending acks and reading sessions, but blocks in Socket.write(ack_buffer). After this node is not reading new session data from the network anymore and only after that moment you will experience failures in your application. An Ack is 3 bytes, so you need to sync a lot of sessions, before the receive buffer and send buffer fill up. My receive buffers are about 90KB and send buffers are 32 KB. (90KB + 32KB) / 3 bytes = 41643 acks before the syncing stops. I see (at this moment) on average 1.5 session messages per second. So it takes me 30000 seconds = 8 hours before my clustering stops. But I could see the receive buffer filling up with 3 bytes a time also in a lab environment. (Use netstat for example to see this.) Why I was seeing this problems only since two weeks is a mistery to me too. Ronald. ------=_Part_39754_3407492.1207046877949 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit

On Mon Mar 31 21:13:25 CEST 2008 Tomcat Users List <users@tomcat.apache.org> wrote:

On Mon, Mar 31, 2008 at 3:38 AM, Ronald Klop <ronald-mailinglist@base.nl> wrote:
>
> See my previous mail about send/receive buffers filling because Ack wasn't
> read by FastAsyncSender.
> The option waitForAck="true" did the trick for me. But for FastAsyncSender
> you should set sendAck="false" on the receiving side.

Thanks for the information, Ronald. Can you clarify your settings by
by posting a minimal configuration? I looked for the option sendAck on
the Tomcat cluster page and couldn't find any reference to that
configuration parameter:
http://tomcat.apache.org/tomcat-5.5-doc/cluster-howto.html

It looks like doing something like one of the two is a good idea for a
barebones setup to make sure that the acking behavior is consistent
since Tomcat doesn't seem to ensure that they are sane:

<Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster"
receiver.sendAck="true" sender.waitForAck="true"/>

<Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster"
receiver.sendAck="false" sender.waitForAck="false"/>

I'm a bit confused as to why this issue only affects one of my
clusters (out of 3 production clusters with identical setups) and not
more people are seeing it. Are most people specifying their Ack
settings? Or do most people not see enough traffic between restarts to
trigger this issue? Granted, the one that's affected also happens to
handle the most traffic by far. I'll have to do more testing on my
test cluster to verify (I've already turned on waitForAck everywhere
in production), hopefully I can reproduce it.

Anyone have information on how using Acks in the cluster affects performance?

-Dave


Hello Dave,

I attached my server.xml file. I hope the mailinglist doesn't filter it.

I think a lot of people don't send enough sessions between the nodes to see this or they use sticky sessions.
The problem is in the Acks not being read by Tomcat. The network receive buffer fills. Only after the receive buffer is full the send buffer on the other node (send acks) is filling. And only after that send buffer is full the node stops sending acks and reading sessions, but blocks in Socket.write(ack_buffer). After this node is not reading new session data from the network anymore and only after that moment you will experience failures in your application.

An Ack is 3 bytes, so you need to sync a lot of sessions, before the receive buffer and send buffer fill up.
My receive buffers are about 90KB and send buffers are 32 KB.
(90KB + 32KB) / 3 bytes = 41643 acks before the syncing stops.

I see (at this moment) on average 1.5 session messages per second. So it takes me 30000 seconds = 8 hours before my clustering stops.

But I could see the receive buffer filling up with 3 bytes a time also in a lab environment. (Use netstat for example to see this.)


Why I was seeing this problems only since two weeks is a mistery to me too.

Ronald.


------=_Part_39754_3407492.1207046877949-- ------=_Part_39755_9027262.1207046878002 Content-Type: text/xml; charset=us-ascii; name=server.xml Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename=server.xml ------=_Part_39755_9027262.1207046878002 Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org For additional commands, e-mail: users-help@tomcat.apache.org ------=_Part_39755_9027262.1207046878002--