Return-Path: Delivered-To: apmail-tomcat-users-archive@www.apache.org Received: (qmail 31186 invoked from network); 25 Aug 2009 15:36:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Aug 2009 15:36:27 -0000 Received: (qmail 3616 invoked by uid 500); 25 Aug 2009 15:36:48 -0000 Delivered-To: apmail-tomcat-users-archive@tomcat.apache.org Received: (qmail 3574 invoked by uid 500); 25 Aug 2009 15:36:48 -0000 Mailing-List: contact users-help@tomcat.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Tomcat Users List" Delivered-To: mailing list users@tomcat.apache.org Received: (qmail 3563 invoked by uid 99); 25 Aug 2009 15:36:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Aug 2009 15:36:48 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [72.22.94.67] (HELO virtual.halosg.com) (72.22.94.67) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Aug 2009 15:36:37 +0000 Received: (qmail 31963 invoked from network); 25 Aug 2009 10:36:09 -0500 Received: from 38-171-19-72.skybeam.com (HELO ?192.168.1.42?) (72.19.171.38) by halosg.com with (DHE-RSA-AES256-SHA encrypted) SMTP; 25 Aug 2009 10:36:06 -0500 Message-ID: <4A9404D0.6040807@hanik.com> Date: Tue, 25 Aug 2009 09:35:44 -0600 From: Filip Hanik - Dev Lists User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1b3pre) Gecko/20090513 Fedora/3.0-2.3.beta2.fc11 Lightning/1.0pre Thunderbird/3.0b2 MIME-Version: 1.0 To: Tomcat Users List Subject: Re: Tomcat cluster fails and generates tons of logs References: <4A939499.6000202@as-guides.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org I've taken a look at the code. The fix for this is easy, but it doesn't explain why it happens. This is a concurrency issue, but if you're not running the latest tomcat version, then it could already have been fixed. best Filip On 08/25/2009 01:55 AM, CS Wong wrote: > Hi Michael, > The logs are the bit that went haywire. The applications at this point still > work but often, there's not enough time to troubleshoot much else. The logs > can increase by 5-6GB in a matter of an hour or so and hence, we often just > kill the service (normal shutdown.sh doesn't respond any more at this point, > we have to kill -9 it) in panic and delete the logs before the entire server > goes kaboom. This time, I managed to tail out some of the logs, for which I > pasted an extract (same repeating pattern of errors): > > Aug 25, 2009 11:44:02 AM org.apache.catalina.ha.session.DeltaRequest reset > SEVERE: Unable to remove element > java.util.NoSuchElementException > at java.util.LinkedList.remove(LinkedList.java:788) > at java.util.LinkedList.removeFirst(LinkedList.java:134) > at org.apache.catalina.ha.session.DeltaRequest.reset(DeltaRequest.java:201) > at > org.apache.catalina.ha.session.DeltaRequest.execute(DeltaRequest.java:195) > at > org.apache.catalina.ha.session.DeltaManager.handleSESSION_DELTA(DeltaManager.java:1364) > at > org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1320) > at > org.apache.catalina.ha.session.DeltaManager.messageDataReceived(DeltaManager.java:1083) > at > org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(ClusterSessionListener.java:87) > at > org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:916) > at > org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:897) > at > org.apache.catalina.tribes.group.GroupChannel.messageReceived(GroupChannel.java:264) > at > org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79) > at > org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.messageReceived(TcpFailureDetector.java:110) > at > org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79) > at > org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79) > at > org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79) > at > org.apache.catalina.tribes.group.ChannelCoordinator.messageReceived(ChannelCoordinator.java:241) > at > org.apache.catalina.tribes.transport.ReceiverBase.messageDataReceived(ReceiverBase.java:225) > at > org.apache.catalina.tribes.transport.nio.NioReplicationTask.drainChannel(NioReplicationTask.java:188) > at > org.apache.catalina.tribes.transport.nio.NioReplicationTask.run(NioReplicationTask.java:91) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) > at java.lang.Thread.run(Thread.java:619) > > Wong > > > > On Tue, Aug 25, 2009 at 3:36 PM, Michael Ludwig wrote: > > >> CS Wong schrieb: >> >> >>> Periodically, I'm getting problems with my Tomcat 6 cluster (2 nodes). >>> One of the nodes would just go haywire >>> >>> >> Could you elaborate on what "going haywire" means? >> > > > > > >> Below, you write: >> >> [The NoSuchElementException is] the only thing that it shows. The >> >>> other node in the cluster is still active at this time. There's >>> nothing to do but to restart. The large amount of logs has caused >>> disk space issues more than a couple of times too. >>> >>> >> So is that server not active any more? Unresponsive? Hyperactive writing >> to the log file? Looping? >> >> and generate a ton of logs repeating the following: >> >>> Aug 25, 2009 11:44:10 AM org.apache.catalina.ha.session.DeltaRequest reset >>> SEVERE: Unable to remove element >>> java.util.NoSuchElementException >>> at java.util.LinkedList.remove(LinkedList.java:788) >>> at java.util.LinkedList.removeFirst(LinkedList.java:134) >>> at >>> org.apache.catalina.ha.session.DeltaRequest.reset(DeltaRequest.java:201) >>> at >>> org.apache.catalina.ha.session.DeltaRequest.execute(DeltaRequest.java:195) >>> at >>> org.apache.catalina.ha.session.DeltaManager.handleSESSION_DELTA(DeltaManager.java:1364) >>> at >>> org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1320) >>> at >>> org.apache.catalina.ha.session.DeltaManager.messageDataReceived(DeltaManager.java:1083) >>> at >>> org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(ClusterSessionListener.java:87) >>> >>> >> I only found this, which seems to have led you here: >> >> http://stackoverflow.com/questions/1326336/ >> >> Maybe it is helpful to others who know about Tomcat internals. >> >> -- >> Michael Ludwig >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org >> For additional commands, e-mail: users-help@tomcat.apache.org >> >> >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org For additional commands, e-mail: users-help@tomcat.apache.org