activemq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manuel Teira (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (AMQ-1146) InactivityMonitor: Transport connection disconnected "Channel was inactive for too long."
Date Thu, 18 Oct 2007 08:18:23 GMT

    [ https://issues.apache.org/activemq/browse/AMQ-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_40405
] 

mteira edited comment on AMQ-1146 at 10/18/07 1:17 AM:
-------------------------------------------------------------

I'm having some issues where slow performance in the backend database seems to lead to InactivityIOExceptions.
My opinion is that the fix for this issue is causing those problems: 

Having in mind that the design of the readCheck/writeCheck monitors is that we always should
have a command among two readCheck (produced by normal activity or by the response of a KeepAlive
injected by the writeCheck between two readChecks if needed), synchronizing on the readChecker
variable seems a bad idea:

If the chain of transportListener.onCommand() is lasting more than the negociated MaxInactivityDuration,
we will end with two readCheck executing one after the other, without a writeCheck between
them. The first readCheck success, sets the commandReceived to false, and the second one is
always going to fail, as no new command has been processed, nor a writeCheck has been executed
since the last readCheck.

I don't see the need of synchronizing on the readChecker/writeChecker variables. These are
AtomicBooleans and should be safe to access them from different threads (this seemed to be
the motivation of the fix). Also, synchronizing the readCheck/writeCheck is actually delaying
the execution of the check, and we cannot guarantee anymore the alternating of writeCheck
readCheck execution, that is needed for the solution to work.

Also, synchronizing that is avoiding this code in readCheck() :
{code}     
synchronized (readChecker) {
        if (inReceive.get()) {
                LOG.trace("A receive is in progress");
                return;
        }
{code}

to be reached: because inReceive is only set during onCommand, and that is synchronized on
readChecker also. Furthermore, the right way to do this is returning here when a onCommand
is performed at the same time, and not waiting, to not break the sequence of readCheck/writeCheck
invocations.


In few words, I think that this change breaks the InactivityMonitor, and my opinion is that
it should be rolled back. Also in the 4.1 subtask that I opened some time ago. :-(

Any comments?



      was (Author: mteira):
    I'm having some issues where slow performance in the backend database seems to lead to
InactivityIOExceptions. My opinion is that the fix for this issue is causing those problems:


Having in mind that the design of the readCheck/writeCheck monitors is that we always should
have a command among two readCheck (produced by normal activity or by the response of a KeepAlive
injected by the writeCheck between two readChecks if needed), synchronizing on the readChecker
variable seems a bad idea:

If the chain of transportListener.onCommand() is lasting more than the negociated MaxInactivityDuration,
we will end with two readCheck executing one after the other, without a writeCheck between
them. The first readCheck success, sets the commandReceived to false, and the second one is
always going to fail, as no new command has been processed, nor a writeCheck has been executed
since the last readCheck.

I don't see the need of synchronizing on the readChecker/writeChecker variables. These are
AtomicBooleans and should be safe to access them from different threads (this seemed to be
the motivation of the fix). Also, synchronizing the readCheck/writeCheck is actually delaying
the execution of the check, and we cannot guarantee anymore the alternating of writeCheck
readCheck execution, that is needed for the solution to work.

Also, synchronizing that is avoiding this code in readCheck() :

        synchronized (readChecker) {
            if (inReceive.get()) {
                LOG.trace("A receive is in progress");
                return;
            }


to be reached: because inReceive is only set during onCommand, and that is synchronized on
readChecker also. Furthermore, the right way to do this is returning here when a onCommand
is performed at the same time, and not waiting, to not break the sequence of readCheck/writeCheck
invocations.


In few words, I think that this change breaks the InactivityMonitor, and my opinion is that
it should be rolled back. Also in the 4.1 subtask that I opened some time ago. :-(

Any comments?


  
> InactivityMonitor: Transport connection disconnected "Channel was inactive for too long."
> -----------------------------------------------------------------------------------------
>
>                 Key: AMQ-1146
>                 URL: https://issues.apache.org/activemq/browse/AMQ-1146
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 4.1.0
>         Environment: Windows Server 2003 running broker and Windows XP running client.
Java 1.5.0_08 
>            Reporter: Mari
>            Assignee: Hiram Chirino
>            Priority: Minor
>             Fix For: 5.0.0
>
>         Attachments: InactivityMonitor_changed.java, InactivityMonitor_Original.java
>
>   Original Estimate: 1 hour
>  Remaining Estimate: 1 hour
>
> Sometimes (when the transport is idle for long) the transport gets disconnected with
the message "Channel was inactive for too long." 
>         Apparently this is same as reported in http://www.nabble.com/Channel-was-inactive-for-too-long-t1463069.html#a3971744
and http://www.nabble.com/Durable-consumer-reconnect-problem-tf1716817.html#a6147014 
> Version: 4.1 
> Background: 
>         The class org.apache.activemq.transport.InactivityMonitor runs monitoring threads
to check the read and write of the transport(TCP for example). If it hasn't sent the message
during the given period, it sends a KeepAliveInfo to ensure the connectivity. If it doesn't
receive the message for a given duration, then it disconnects the transport. 
>         By default, the maximumInactivityDuration is 30000 milliseconds. This is the
time interval between the check for read. That is if an end of the transport doesn't receive
any message for this period, then it will close the connection. For the read check, the time
interval is half of maximumInactivityDuration i.e. 15000 millisecond. That is if a message
was not sent during this period, it will send a KeepAliveInfo to ensure the connectivity.

> Problem and the recommended fix: 
>         The InactivityMonitor class uses the flags inReceive and commandReceived variables
in methods readCheck and onCommand. Typically these two methods are called from different
threads. Also, the flags inSend and commandSent are used in methods oneway and writeCheck.
Again these two methods are called from different threads. But, the synchronization was missing
for these. This seems to be a potential cause for the problem. So, synchronization has been
incorporated for these flag usages. 
>         The same class InactivityMonitor is used at both client and server side. In general,
the write check (which sends KeepAliveInfo) happens twice during the time when the read check
happens once. Probably the client and server machines have performance differences and just
to be safer, now the write check is made to happen thrice during the time when the read check
happens once. 
>         The original and the changed source files are attached here. 
> http://www.nabble.com/file/5958/InactivityMonitor_changed.java
> http://www.nabble.com/file/5957/InactivityMonitor_Original.java

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message