tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Filip Hanik - Dev Lists <devli...@hanik.com>
Subject Re: Tomcat cluster fails and generates tons of logs
Date Wed, 26 Aug 2009 02:39:46 GMT
hi Wong, yes, that one does implement a higher level of thread safety, 
and most likely would resolve your problem.
With 6.0.20, there is a regression where tomcat nodes on the same host 
wont discover each other
https://issues.apache.org/bugzilla/show_bug.cgi?id=47308

Filip

On 08/25/2009 07:22 PM, CS Wong wrote:
> A brief look through "svn log
> http://svn.apache.org/repos/asf/tomcat/trunk/java/org/apache/catalina/ha/session/DeltaRequest.java"
> turns up this:
> ------------------------------------------------------------------------
> r618823 | fhanik | 2008-02-06 07:29:56 +0800 (Wed, 06 Feb 2008) | 3 lines
>
> Remove synchronization on the DeltaRequest object, and let the object that
> manages the delta request (session/manager) to handle the locking properly,
> using the session lock
> There is a case with a non sticky load balancer where using synchronized and
> a lock (essentially two locks) can end up in a dead lock
> ------------------------------------------------------------------------
>
> This is the only one where the commit comments seem to indicate anything
> related to my issue. Given that 6.0.14 was released on 14 Aug 2007 (
> http://www.mail-archive.com/announce@apache.org/msg00386.html), it may be
> applicable.
>
> Would just like to know your opinion, is it likely that this is the issue
> I'm facing? Thanks!
>
> Wong
>
>
> On Wed, Aug 26, 2009 at 8:48 AM, CS Wong<lilwong@gmail.com>  wrote:
>
>    
>> Thanks, Filip.
>> I'm running 6.0.14 right now. Would you have any idea whether any changes
>> in the code since then would have fixed something like this? I can try to
>> push for an upgrade to 6.0.20 but the app owners would probably want to know
>> whether it would be fixed for sure since they have to go through a rather
>> troublesome round of testing which takes up quite a bit of time. It helps
>> that they know that the problem won't reoccur once this has been done.
>>
>> Thanks,
>> Wong
>>
>>
>> On Tue, Aug 25, 2009 at 11:35 PM, Filip Hanik - Dev Lists<
>> devlists@hanik.com>  wrote:
>>
>>      
>>> I've taken a look at the code.
>>> The fix for this is easy, but it doesn't explain why it happens. This is a
>>> concurrency issue, but if you're not running the latest tomcat version, then
>>> it could already have been fixed.
>>>
>>> best
>>> Filip
>>>
>>>
>>> On 08/25/2009 01:55 AM, CS Wong wrote:
>>>
>>>        
>>>> Hi Michael,
>>>> The logs are the bit that went haywire. The applications at this point
>>>> still
>>>> work but often, there's not enough time to troubleshoot much else. The
>>>> logs
>>>> can increase by 5-6GB in a matter of an hour or so and hence, we often
>>>> just
>>>> kill the service (normal shutdown.sh doesn't respond any more at this
>>>> point,
>>>> we have to kill -9 it) in panic and delete the logs before the entire
>>>> server
>>>> goes kaboom. This time, I managed to tail out some of the logs, for which
>>>> I
>>>> pasted an extract (same repeating pattern of errors):
>>>>
>>>> Aug 25, 2009 11:44:02 AM org.apache.catalina.ha.session.DeltaRequest
>>>> reset
>>>> SEVERE: Unable to remove element
>>>> java.util.NoSuchElementException
>>>> at java.util.LinkedList.remove(LinkedList.java:788)
>>>> at java.util.LinkedList.removeFirst(LinkedList.java:134)
>>>> at
>>>> org.apache.catalina.ha.session.DeltaRequest.reset(DeltaRequest.java:201)
>>>> at
>>>>
>>>> org.apache.catalina.ha.session.DeltaRequest.execute(DeltaRequest.java:195)
>>>> at
>>>>
>>>> org.apache.catalina.ha.session.DeltaManager.handleSESSION_DELTA(DeltaManager.java:1364)
>>>> at
>>>>
>>>> org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1320)
>>>> at
>>>>
>>>> org.apache.catalina.ha.session.DeltaManager.messageDataReceived(DeltaManager.java:1083)
>>>> at
>>>>
>>>> org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(ClusterSessionListener.java:87)
>>>> at
>>>>
>>>> org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:916)
>>>> at
>>>>
>>>> org.apache.catalina.ha.tcp.SimpleTcpCluster.messageReceived(SimpleTcpCluster.java:897)
>>>> at
>>>>
>>>> org.apache.catalina.tribes.group.GroupChannel.messageReceived(GroupChannel.java:264)
>>>> at
>>>>
>>>> org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79)
>>>> at
>>>>
>>>> org.apache.catalina.tribes.group.interceptors.TcpFailureDetector.messageReceived(TcpFailureDetector.java:110)
>>>> at
>>>>
>>>> org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79)
>>>> at
>>>>
>>>> org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79)
>>>> at
>>>>
>>>> org.apache.catalina.tribes.group.ChannelInterceptorBase.messageReceived(ChannelInterceptorBase.java:79)
>>>> at
>>>>
>>>> org.apache.catalina.tribes.group.ChannelCoordinator.messageReceived(ChannelCoordinator.java:241)
>>>> at
>>>>
>>>> org.apache.catalina.tribes.transport.ReceiverBase.messageDataReceived(ReceiverBase.java:225)
>>>> at
>>>>
>>>> org.apache.catalina.tribes.transport.nio.NioReplicationTask.drainChannel(NioReplicationTask.java:188)
>>>> at
>>>>
>>>> org.apache.catalina.tribes.transport.nio.NioReplicationTask.run(NioReplicationTask.java:91)
>>>> at
>>>>
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>>>> at
>>>>
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>>>> at java.lang.Thread.run(Thread.java:619)
>>>>
>>>> Wong
>>>>
>>>>
>>>>
>>>> On Tue, Aug 25, 2009 at 3:36 PM, Michael Ludwig<mlu@as-guides.com>
>>>>   wrote:
>>>>
>>>>
>>>>
>>>>          
>>>>> CS Wong schrieb:
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>> Periodically, I'm getting problems with my Tomcat 6 cluster (2 nodes).
>>>>>> One of the nodes would just go haywire
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>> Could you elaborate on what "going haywire" means?
>>>>>
>>>>>
>>>>>            
>>>>
>>>>
>>>>
>>>>
>>>>          
>>>>> Below, you write:
>>>>>
>>>>>   [The NoSuchElementException is] the only thing that it shows. The
>>>>>
>>>>>
>>>>>            
>>>>>> other node in the cluster is still active at this time. There's
>>>>>> nothing to do but to restart. The large amount of logs has caused
>>>>>> disk space issues more than a couple of times too.
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>> So is that server not active any more? Unresponsive? Hyperactive writing
>>>>> to the log file? Looping?
>>>>>
>>>>>   and generate a ton of logs repeating the following:
>>>>>
>>>>>
>>>>>            
>>>>>> Aug 25, 2009 11:44:10 AM org.apache.catalina.ha.session.DeltaRequest
>>>>>> reset
>>>>>> SEVERE: Unable to remove element
>>>>>> java.util.NoSuchElementException
>>>>>>         at java.util.LinkedList.remove(LinkedList.java:788)
>>>>>>         at java.util.LinkedList.removeFirst(LinkedList.java:134)
>>>>>>         at
>>>>>>
>>>>>> org.apache.catalina.ha.session.DeltaRequest.reset(DeltaRequest.java:201)
>>>>>>         at
>>>>>>
>>>>>> org.apache.catalina.ha.session.DeltaRequest.execute(DeltaRequest.java:195)
>>>>>>         at
>>>>>>
>>>>>> org.apache.catalina.ha.session.DeltaManager.handleSESSION_DELTA(DeltaManager.java:1364)
>>>>>>         at
>>>>>>
>>>>>> org.apache.catalina.ha.session.DeltaManager.messageReceived(DeltaManager.java:1320)
>>>>>>         at
>>>>>>
>>>>>> org.apache.catalina.ha.session.DeltaManager.messageDataReceived(DeltaManager.java:1083)
>>>>>>         at
>>>>>>
>>>>>> org.apache.catalina.ha.session.ClusterSessionListener.messageReceived(ClusterSessionListener.java:87)
>>>>>>
>>>>>>
>>>>>>
>>>>>>              
>>>>> I only found this, which seems to have led you here:
>>>>>
>>>>> http://stackoverflow.com/questions/1326336/
>>>>>
>>>>> Maybe it is helpful to others who know about Tomcat internals.
>>>>>
>>>>> --
>>>>> Michael Ludwig
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>>>>> For additional commands, e-mail: users-help@tomcat.apache.org
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>
>>>>          
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
>>> For additional commands, e-mail: users-help@tomcat.apache.org
>>>
>>>
>>>        
>
>    


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message