tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vince Stewart <stewart.vi...@gmail.com>
Subject Re: BackupManager start fails under heavy load
Date Fri, 28 Jun 2013 18:36:16 GMT
Hi Patrick,
A similar problem has been reported before:
http://tomcat.10.n6.nabble.com/org-apache-catalina-tribes-ChannelException-Operation-has-timed-out-3000-ms-Faulty-members-tcp-64-88-td4656393.html
The important error message from your log output is:

   ................

>   Caused by: org.apache.catalina.tribes.ChannelException: Operation has
> timed out(3000 ms.).; Faulty members:tcp://{10, 230, 20, 86}:4001;
> tcp://{10, 230, 20, 87}:4001; tcp://{10, 230, 20, 94}:4001; tcp://{10, 230,
> 20, 95}:4001; tcp://{10, 230, 20, 70}:4001; tcp://{10, 230, 20, 89}:4001;
>
>     at
>
> org.apache.catalina.tribes.transport.nio.ParallelNioSender.sendMessage(Paral
> lelNioSender.java:109)
> ...............
>

I am familiar with the code that generates this message; the problem is
that the sending operation is abandoned for any sender object which has not
been drained of data within timeout milliseconds. The "timeout" parameter
is declared in AbstractSender class as (long) 3000. By my reckoning a small
change to the timeout value could result a large reduction in messaging
failures.

According to information from this page:
http://tomcat.apache.org/tomcat-7.0-doc/config/cluster-sender.html

you should be able to increase the timeout parameter by setting a transport
attribute thus:

      <Sender
className="org.apache.catalina.tribes.transport.ReplicationTransmitter">
        <Transport
className="org.apache.catalina.tribes.transport.nio.PooledParallelSender"
        timeout="4000"
       </Transport>
      </Sender>

However, I can not find the code where the system reads the configuration
to override the default value; if you make the alteration and find the
error message still reports "3000ms", this would indicate an oversight in
the coding which could be reported.

BTW, your configuration for receiver has
selectorTimeout="100"

The code suggests that this should be the same value as sender/transport
timeout (ie 3000). The documentation says the default is 5000. My
examination of the code suggests that the PooledParallelSender class does
not read the configuration but always uses 5000. Nevertheless, you could
try setting that value to 5000 and seeing what happens.

BTW my own interest was to implement tribes at Internet connection speed;
by manipulating the parameter in question, my system copes with data
transfers that take multiple seconds.
http://tomcat.10.x6.nabble.com/overcoming-a-message-size-limitation-in-tribes-parallel-messaging-with-NioSender-tt4995446.html

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message