activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Davies <rajdav...@gmail.com>
Subject Re: ActiveMQ+NMS+TCP Connection Problems
Date Wed, 10 Sep 2008 06:09:40 GMT
Awesome Jim!

On 10 Sep 2008, at 06:48, Jim Gomes wrote:

> FYI, the NMS trunk now has the keep alive support implemented.  You  
> can turn
> it on with the URI parameter "wireFormat.MaxInactivityDuration=nnnn"  
> and
> "wireFormat.MaxInactivityDurationInitialDelay=nnnn" where 'n' equals  
> the
> number of milliseconds.  The initial delay option is optional and not
> required to be used at the same time.  It should operate just like  
> the Java
> client.  I observed that the server will send a KeepAliveInfo  
> command to the
> client periodically.  The client then responds back.  This should  
> keep the
> socket connection alive even when no messages are flowing.  I would be
> willing to bet that this is what the two ActiveMQ servers are doing  
> to each
> other, which is why that solution worked for you.
>
> Best,
> Jim
>
> On Tue, Sep 9, 2008 at 8:32 AM, Bryan Murphy <bmurphy1976@gmail.com>  
> wrote:
>
>> We basically run a server here in our local office behind a  
>> firewall, and
>> the rest of our stuff out on Amazon's EC2 cloud.  We suspect there  
>> were
>> issues with NAT timeouts and half dead TCP connections.
>> The specific behaviors we saw using NMS manifested themselves in the
>> following ways:
>>
>> 1. Client blocked on TCP connection waiting for messages, server  
>> does not
>> think client is connected anymore.
>>
>> 2. Client blocked on TCP connection, server reports *multiple*  
>> listeners
>> for
>> a queue that should only have one listener (the number changes over  
>> time,
>> tended to tick upwards, and then to downwards, probably after the  
>> server
>> timed out a dead tcp connection, sometimes saw a listener count  
>> upwards of
>> 9
>> or 10 when there should only be 1).
>>
>> 3. Clients do not appear to always re-establish connection to  
>> server once
>> connection is dead.  Frequently had to restart clients,  
>> occasionally had to
>> restart server.
>>
>> 4. Message queues that were idle for long periods at a time exhibited
>> problematic behavior.  Messages queues that were active remained  
>> available
>> (a huge indicator what was going on after fixing #5).
>>
>> 5. Hitting ^C to kill our application and not handling break to  
>> properly
>> close connections caused behaviors very similar to what we were  
>> eventually
>> seeing with our TCP connections.  This, of course, made the issue  
>> that much
>> more confusing and difficult to debug since not all communication  
>> problems
>> were rooted at the network layer and the results were at least  
>> initially
>> maddeningly inconsistent.
>>
>> We experimented with more aggressive request timeouts on the  
>> transport
>> layer/session/connection (even modified the driver to ensure these  
>> were
>> getting set), setting up static routes, opening up firewall ports and
>> playing with the TCP timeouts (at least on our end, we have no  
>> control on
>> the Amazon side).  We tried prefetch size of one and tried to  
>> enable the
>> keep alive but never figured out how to do it.  The only solution  
>> that
>> worked was the ActiveMQ to ActiveMQ bridge, and I suspect some of  
>> that may
>> have to do with that we were never able to get keep alives working  
>> and we
>> have no control over fine-grained NAT settings on the Amazon side.
>>
>> Bryan
>>
>>
>> On Tue, Sep 9, 2008 at 10:09 AM, James Strachan <james.strachan@gmail.com
>>> wrote:
>>
>>> Maybe the WAN is dropping connections; we have failover in Java; am
>>> not sure we've added that to NMS yet have we?
>>>
>>> 2008/9/9 Jim Gomes <e.semog@gmail.com>:
>>>> Hi Bryan,
>>>> That's interesting.  I wonder where the problem is with ActiveMQ  
>>>> => NMS
>>>> connection.  Without knowing your exact network topology, I can't  
>>>> point
>>> to
>>>> where the problem is.  All I can do is speak to my experience and I
>> have
>>>> been able to keep connections alive for a very long time without
>> errors,
>>>> both with high- and low-activity, even going over what my
>> infrastructure
>>>> team has told me is a WAN connection.
>>>>
>>>> Best,
>>>> Jim
>>>>
>>>> On Tue, Sep 9, 2008 at 7:35 AM, Bryan Murphy  
>>>> <bmurphy1976@gmail.com>
>>> wrote:
>>>>
>>>>> Thanks for the info.  I suspected that's what the timeout meant,  
>>>>> but
>> you
>>>>> never really know until you ask..
>>>>> Anyway, we finally solved our issue.  We setup two instances of
>> ActiveMQ
>>> in
>>>>> the two data centers to forward messages back and forth between  
>>>>> each
>>> other.
>>>>> This is working much better for us.  It seems the ActiveMQ to
>> ActiveMQ
>>>>> communication is a bit more robust than the ActiveMQ to Apache.NMS
>>>>> communication (at least when running over a WAN).
>>>>>
>>>>> Bryan
>>>>>
>>>>> On Mon, Sep 8, 2008 at 2:49 PM, Jim Gomes <e.semog@gmail.com> 

>>>>> wrote:
>>>>>
>>>>>> Hi Bryan,
>>>>>> I can't answer all of your questions, yet.  But I can answer  
>>>>>> some of
>>>>> them,
>>>>>> anyway.
>>>>>>
>>>>>> 1. As far as the ResponseTimeout property goes, that is used for
>>> network
>>>>>> timeouts.  It's not a JMS timeout value like TimeToLive.  The
>>>>>> ResponseTimeout is used by the client to wait for a response from
>> the
>>>>>> broker.  Since a network call is inherently a blocking operation
>> (send
>>>>>> request, wait for response), if we never receive a response  
>>>>>> from a
>>>>>> dead/hung
>>>>>> broker, the client will hang as well.  The ResponseTimeout lets
>> client
>>>>>> abort
>>>>>> waiting for the response from the broker.  This can be set to
>> whatever
>>>>>> performance constraints your application requires.  In a WAN
>>> environment,
>>>>>> this might be set to something fairly high where there is a lot 

>>>>>> of
>>>>> latency
>>>>>> in network round-trips.  The socket connection is not dropped.  

>>>>>> The
>>>>> client
>>>>>> simply stops waiting for the broker to respond and goes into its
>>>>>> error-handling code for a non-response.
>>>>>>
>>>>>> 2. I see the marshalling code for the KeepAliveInfo, but like  
>>>>>> you I
>>> don't
>>>>>> see how this is turned on or controlled from the client-side.   
>>>>>> This
>>> would
>>>>>> need more investigation to see if it is enabled via a URI  
>>>>>> parameter,
>>> or
>>>>> if
>>>>>> new code needs to be written to enable its use.
>>>>>>
>>>>>> 3. Can't answer the server-side socket issue.  Don't know that  
>>>>>> code.
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> James
>>> -------
>>> http://macstrac.blogspot.com/
>>>
>>> Open Source Integration
>>> http://open.iona.com
>>>
>>


Mime
View raw message