activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Gomes" <e.se...@gmail.com>
Subject Re: ActiveMQ+NMS+TCP Connection Problems
Date Wed, 10 Sep 2008 05:48:33 GMT
FYI, the NMS trunk now has the keep alive support implemented.  You can turn
it on with the URI parameter "wireFormat.MaxInactivityDuration=nnnn" and
"wireFormat.MaxInactivityDurationInitialDelay=nnnn" where 'n' equals the
number of milliseconds.  The initial delay option is optional and not
required to be used at the same time.  It should operate just like the Java
client.  I observed that the server will send a KeepAliveInfo command to the
client periodically.  The client then responds back.  This should keep the
socket connection alive even when no messages are flowing.  I would be
willing to bet that this is what the two ActiveMQ servers are doing to each
other, which is why that solution worked for you.

Best,
Jim

On Tue, Sep 9, 2008 at 8:32 AM, Bryan Murphy <bmurphy1976@gmail.com> wrote:

> We basically run a server here in our local office behind a firewall, and
> the rest of our stuff out on Amazon's EC2 cloud.  We suspect there were
> issues with NAT timeouts and half dead TCP connections.
> The specific behaviors we saw using NMS manifested themselves in the
> following ways:
>
> 1. Client blocked on TCP connection waiting for messages, server does not
> think client is connected anymore.
>
> 2. Client blocked on TCP connection, server reports *multiple* listeners
> for
> a queue that should only have one listener (the number changes over time,
> tended to tick upwards, and then to downwards, probably after the server
> timed out a dead tcp connection, sometimes saw a listener count upwards of
> 9
> or 10 when there should only be 1).
>
> 3. Clients do not appear to always re-establish connection to server once
> connection is dead.  Frequently had to restart clients, occasionally had to
> restart server.
>
> 4. Message queues that were idle for long periods at a time exhibited
> problematic behavior.  Messages queues that were active remained available
> (a huge indicator what was going on after fixing #5).
>
> 5. Hitting ^C to kill our application and not handling break to properly
> close connections caused behaviors very similar to what we were eventually
> seeing with our TCP connections.  This, of course, made the issue that much
> more confusing and difficult to debug since not all communication problems
> were rooted at the network layer and the results were at least initially
> maddeningly inconsistent.
>
> We experimented with more aggressive request timeouts on the transport
> layer/session/connection (even modified the driver to ensure these were
> getting set), setting up static routes, opening up firewall ports and
> playing with the TCP timeouts (at least on our end, we have no control on
> the Amazon side).  We tried prefetch size of one and tried to enable the
> keep alive but never figured out how to do it.  The only solution that
> worked was the ActiveMQ to ActiveMQ bridge, and I suspect some of that may
> have to do with that we were never able to get keep alives working and we
> have no control over fine-grained NAT settings on the Amazon side.
>
> Bryan
>
>
> On Tue, Sep 9, 2008 at 10:09 AM, James Strachan <james.strachan@gmail.com
> >wrote:
>
> > Maybe the WAN is dropping connections; we have failover in Java; am
> > not sure we've added that to NMS yet have we?
> >
> > 2008/9/9 Jim Gomes <e.semog@gmail.com>:
> > > Hi Bryan,
> > > That's interesting.  I wonder where the problem is with ActiveMQ => NMS
> > > connection.  Without knowing your exact network topology, I can't point
> > to
> > > where the problem is.  All I can do is speak to my experience and I
> have
> > > been able to keep connections alive for a very long time without
> errors,
> > > both with high- and low-activity, even going over what my
> infrastructure
> > > team has told me is a WAN connection.
> > >
> > > Best,
> > > Jim
> > >
> > > On Tue, Sep 9, 2008 at 7:35 AM, Bryan Murphy <bmurphy1976@gmail.com>
> > wrote:
> > >
> > >> Thanks for the info.  I suspected that's what the timeout meant, but
> you
> > >> never really know until you ask..
> > >> Anyway, we finally solved our issue.  We setup two instances of
> ActiveMQ
> > in
> > >> the two data centers to forward messages back and forth between each
> > other.
> > >>  This is working much better for us.  It seems the ActiveMQ to
> ActiveMQ
> > >> communication is a bit more robust than the ActiveMQ to Apache.NMS
> > >> communication (at least when running over a WAN).
> > >>
> > >> Bryan
> > >>
> > >> On Mon, Sep 8, 2008 at 2:49 PM, Jim Gomes <e.semog@gmail.com> wrote:
> > >>
> > >> > Hi Bryan,
> > >> > I can't answer all of your questions, yet.  But I can answer some
of
> > >> them,
> > >> > anyway.
> > >> >
> > >> > 1. As far as the ResponseTimeout property goes, that is used for
> > network
> > >> > timeouts.  It's not a JMS timeout value like TimeToLive.  The
> > >> > ResponseTimeout is used by the client to wait for a response from
> the
> > >> > broker.  Since a network call is inherently a blocking operation
> (send
> > >> > request, wait for response), if we never receive a response from a
> > >> > dead/hung
> > >> > broker, the client will hang as well.  The ResponseTimeout lets
> client
> > >> > abort
> > >> > waiting for the response from the broker.  This can be set to
> whatever
> > >> > performance constraints your application requires.  In a WAN
> > environment,
> > >> > this might be set to something fairly high where there is a lot of
> > >> latency
> > >> > in network round-trips.  The socket connection is not dropped.  The
> > >> client
> > >> > simply stops waiting for the broker to respond and goes into its
> > >> > error-handling code for a non-response.
> > >> >
> > >> > 2. I see the marshalling code for the KeepAliveInfo, but like you
I
> > don't
> > >> > see how this is turned on or controlled from the client-side.  This
> > would
> > >> > need more investigation to see if it is enabled via a URI parameter,
> > or
> > >> if
> > >> > new code needs to be written to enable its use.
> > >> >
> > >> > 3. Can't answer the server-side socket issue.  Don't know that code.
> > >> >
> > >> >
> > >>
> > >
> >
> >
> >
> > --
> > James
> > -------
> > http://macstrac.blogspot.com/
> >
> > Open Source Integration
> > http://open.iona.com
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message