tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bob DeRemer <bob.dere...@thingworx.com>
Subject RE: Tomcat 7.0.48 JSR-356 Server (appears ??) to be closing websockets under heavy load with REASON (1006) "network name is no longer available"
Date Wed, 06 Nov 2013 19:08:39 GMT


> -----Original Message-----
> From: Bob DeRemer [mailto:bob.deremer@thingworx.com]
> Sent: Wednesday, November 06, 2013 1:41 PM
> To: Tomcat Users List
> Subject: Tomcat 7.0.48 JSR-356 Server (appears ??) to be closing websockets
> under heavy load with REASON (1006) "network name is no longer available"
> 
> BACKGROUND:
> We've been load testing our websocket implementation running behind EC2
> ELB.  The ELB is using 4 LARGE EC2 instances we have running Tomcat with our
> websocket implementation.  We have each Tomcat configured with the
> following settings:
> 
> <Connector port="80"
>            protocol="org.apache.coyote.http11.Http11NioProtocol"
>            connectionTimeout="20000"
>            maxConnections="-1"
>            maxThreads="10000"
>            redirectPort="443" />
> 
> NOTES about our ELB configuration:
> 
> *         It was pre-warmed up for 100K+ tcp connections 4 days ago, so it's not a
> scale issue in ELB
> 
> *         In addition, we had them increase IDLE connection time to 15 mins - as
> we already hit that problem since we hadn't incorporated a PING yet
> 
> 
> When we fire up 4 separate client machines - each one creating/connecting
> 25K jsr-356 websockets, they all connect through the ELB.  Once connected, we
> have an Executor that will use a configurable number of threads (usually 50 -
> 100).  Each thread will grab a websocket from a queue, delay some specified
> time, then send a message.  Upon success, it puts the websocket back on the
> queue.  The executor threads continue round-robin style until all websockets
> have sent the specified number messages.  The test program then closes all
> websockets, prints stats and exits.
> 
> PROBLEM:
> The problem we saw last night during a run was that after some period of time,
> the client websocket(s) OnClose received the following:
> CloseReason: code [1006], reason [The specified network name is no longer
> available.
> 
> AT this point, we can no longer send, so our test considers that websocket to be
> aborted.
> 
> QUESTION:
> Is anyone aware of what could cause this CloseReason client-side?  Can an
> underlying client-side problem cause this, or would this be something caused by
> the Tomcat server closing the connection for some reason?  I ask because it
> sounds "possibly" similar to the following - even though this was received
> client-side and to our knowledge, the ELB did not close our connection.
> 
> [Bug 55170] New: [websocket][jsr 356]Thread falls in endless cycle when
> connection is reset
> 

UPDATE: after some further investigation, it appears that 1 of the EC2 instances failed the
health check, which is just an empty servlet call right now.  
This would result in ALL connections to THAT specific instance being closed, so that explains
the closes.

The mystery is HOW 5 consecutive calls to an empty servlet (/Health) could all fail when the
machine was not even close to any capacity (CPU, threads, memory).  I'm gonna check the access
valve log to see if that shows anything.  

* if anyone has any other suggestions on what may have happened or things to check, it would
be appreciated.

Thanks
-bob

> Thanks,
> Bob
> 
> http://www.thingworx.com<http://www.thingworx.com/>
> Skype: bob.deremer.thingworx
> O: 610.594.6200 x812
> M: 717.881.3986


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message