tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 37896] New: - FastAsyncSocketSender blocks all threads on socket error
Date Wed, 14 Dec 2005 02:54:45 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=37896>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=37896

           Summary: FastAsyncSocketSender blocks all threads on socket error
           Product: Tomcat 5
           Version: 5.5.12
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Catalina:Cluster
        AssignedTo: tomcat-dev@jakarta.apache.org
        ReportedBy: tedman@sfu.ca


If one server fails "badly" (I believe resulting in a socket time out error) the
FastAsyncSocketSender is locked by a thread and causes a backlog on all
subsequent http threads causing the entire machine to run out of sockets.

Details below :

Default cluster settings : 
<Cluster className="org.apache.catalina.cluster.tcp.SimpleTcpCluster" />

We have mutlipele web machines (6 of them). Something really bad happened at our
data center (not sure what, cable fault, some dweeb tripped on our ethernet,
don't quite know yet) causing one of our web servers to die.

The rest of the machines then back logged trying to replicate to the dead
machine, which caused all the web servers to fill up the max threads causing a
site outtage.

We took stack traces at the point in time where we had to restart the tomcat
process, what I believe to be the relavent stack traces are included below. 

You can see one of the http threads (143) is trying to replicate synchronously
(which I found odd using fastasynch but okay) I believe this thread is stuck on
a 2 minute socket time out and currently holds a lock on FastAsych.

Notice the Cluster-MembershipReceiver thread is waiting for the fastAsynch
object and currently holds a lock on ReplicationTransmitter.

Notice Http thread (147) is waiting on ReplicationTransmitter. As a result I
have about 298 other Http threads all waiting on ReplicationTransmitter. I had
300 threads configured.

Now I realised after a "while" the socket will time out and it'll all work
itself out but our site was stuck in this mode for over 10 minutes so I think
this is kind of a bug on the basis that 1 machine dying (albiet badly) shouldn't
cause all other machines to backlog at all.

------------

"http-80-Processor143" daemon prio=1 tid=0x084ad748 nid=0x6953 runnable
[0x7e7bf000..0x7e7bf63c]
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:124)
	at org.apache.catalina.cluster.tcp.DataSender.writeData(DataSender.java:830)
	at org.apache.catalina.cluster.tcp.DataSender.pushMessage(DataSender.java:772)
	at org.apache.catalina.cluster.tcp.DataSender.sendMessage(DataSender.java:598)
	- locked <0x4e7864f8> (a org.apache.catalina.cluster.tcp.FastAsyncSocketSender)
	at
org.apache.catalina.cluster.tcp.ReplicationTransmitter.sendMessageData(ReplicationTransmitter.java:868)
	at
org.apache.catalina.cluster.tcp.ReplicationTransmitter.sendMessageClusterDomain(ReplicationTransmitter.java:460)
	at
org.apache.catalina.cluster.tcp.SimpleTcpCluster.sendClusterDomain(SimpleTcpCluster.java:1017)
	at
org.apache.catalina.cluster.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:333)
	at
org.apache.catalina.cluster.tcp.ReplicationValve.invoke(ReplicationValve.java:271)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
	at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
	at
org.apache.catalina.valves.FastCommonAccessLogValve.invoke(FastCommonAccessLogValve.java:495)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:868)
	at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:663)
	at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
	at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
	at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
	at java.lang.Thread.run(Thread.java:595)



"Cluster-MembershipReceiver" daemon prio=1 tid=0x78804ad8 nid=0x661c waiting for
monitor entry [0x786ff000..0x786ff73c]
	at org.apache.catalina.cluster.tcp.DataSender.disconnect(DataSender.java:560)
	- waiting to lock <0x4e7864f8> (a
org.apache.catalina.cluster.tcp.FastAsyncSocketSender)
	at
org.apache.catalina.cluster.tcp.FastAsyncSocketSender.disconnect(FastAsyncSocketSender.java:295)
	at
org.apache.catalina.cluster.tcp.ReplicationTransmitter.remove(ReplicationTransmitter.java:689)
	- locked <0x4e7a4e68> (a org.apache.catalina.cluster.tcp.ReplicationTransmitter)
	at
org.apache.catalina.cluster.tcp.SimpleTcpCluster.memberDisappeared(SimpleTcpCluster.java:1124)
	at
org.apache.catalina.cluster.mcast.McastService.memberDisappeared(McastService.java:455)
	at
org.apache.catalina.cluster.mcast.McastServiceImpl.receive(McastServiceImpl.java:221)
	at
org.apache.catalina.cluster.mcast.McastServiceImpl$ReceiverThread.run(McastServiceImpl.java:253)



"http-80-Processor147" daemon prio=1 tid=0x084b1208 nid=0x6957 waiting for
monitor entry [0x7e8bf000..0x7e8bf83c]
	at
org.apache.catalina.cluster.tcp.ReplicationTransmitter.addStats(ReplicationTransmitter.java:702)
	- waiting to lock <0x4e7a4e68> (a
org.apache.catalina.cluster.tcp.ReplicationTransmitter)
	at
org.apache.catalina.cluster.tcp.ReplicationTransmitter.sendMessageData(ReplicationTransmitter.java:870)
	at
org.apache.catalina.cluster.tcp.ReplicationTransmitter.sendMessageClusterDomain(ReplicationTransmitter.java:460)
	at
org.apache.catalina.cluster.tcp.SimpleTcpCluster.sendClusterDomain(SimpleTcpCluster.java:1017)
	at
org.apache.catalina.cluster.tcp.ReplicationValve.sendSessionReplicationMessage(ReplicationValve.java:333)
	at
org.apache.catalina.cluster.tcp.ReplicationValve.invoke(ReplicationValve.java:271)
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
	at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
	at
org.apache.catalina.valves.FastCommonAccessLogValve.invoke(FastCommonAccessLogValve.java:495)
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
	at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:868)
	at
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:663)
	at
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
	at
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
	at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
	at java.lang.Thread.run(Thread.java:595)

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


Mime
View raw message