activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Bain <tb...@alumni.duke.edu>
Subject Re: Static: network connectors and maxReconnectAttempts
Date Fri, 24 Oct 2014 20:14:45 GMT
Gary,

I was able to get failover working between brokers in a network of brokers
using the maxReconnectAttempts=0 URI option.  However, when I tried adding
priorityBackup=true, I ran into problems.
http://activemq.2283324.n4.nabble.com/priorityBackup-not-supported-with-masterslave-td4677626.html#a4677945
and https://issues.apache.org/jira/browse/AMQ-4720 indicate that
priorityBackup won't work with broker-to-broker failover connections, for
the same reasons you and I went through last month.

For us, this makes failover networkConnectors (including ones using the
masterslave transport, since it's just chrome on top of failover) useless,
since our goal is to minimize the number of hops a message has to take and
the lack of fail-back behavior on the broker-to-broker connections
introduces an extra hop when messages continue going to the backup and then
have to be forwarded to the primary where all the clients now are.  For us,
a static mesh is better than a failover network that will have this
sub-optimal routing, so we'll stay with that until the static transport is
able to handle failover transports with the priorityBackup option enabled.

I looked in JIRA and couldn't find any enhancement request for making the
static transport handle this gracefully (your comments on
https://issues.apache.org/jira/browse/AMQ-4720 do indicate that that's what
would need to happen to fix that bug, but I think it's better as a
stand-alone request), so I submitted
https://issues.apache.org/jira/browse/AMQ-5411 to capture it.

But I think there's also a workaround that could be implemented: if
maxReconnects=0, when the priority connection is established following a
failover, the failover transport can kill both connections (the old one to
the backup broker and the new one to the priority broker), let the failure
bubble up to the static transport, and let it use the failover transport to
reconnect (to the priority URI, since it's now up).  I've submitted
https://issues.apache.org/jira/browse/AMQ-5412 to capture that workaround
request, in case doing the full rewrite described in AMQ-5411 isn't an
option in the near term.

Tim

On Mon, Sep 29, 2014 at 9:51 AM, Tim Bain <tbain@alumni.duke.edu> wrote:

> Sounds good; thanks for the explanation.
>
> On Mon, Sep 29, 2014 at 4:17 AM, Gary Tully <gary.tully@gmail.com> wrote:
>
>> everything is possible! but they evolved independently, hence the overlap
>> in functionality
>>
>> On 26 September 2014 16:02, Tim Bain <tbain@alumni.duke.edu> wrote:
>>
>> > Would it be possible for the failover transport to use the same
>> > DiscoveryListener mechanism that the static transport uses, but that's
>> just
>> > not how it's been implemented?  Or is there something fundamental about
>> why
>> > static is allowed to do its own reconnections (notifying the bridge via
>> the
>> > event handlers on the bridge's DiscoveryListener interface) but failover
>> > has to let connection failures bubble up to the bridge?
>> >
>> > Thanks for taking the time to clarify this, by the way.
>> >
>> > On Fri, Sep 26, 2014 at 4:14 AM, Gary Tully <gary.tully@gmail.com>
>> wrote:
>> >
>> > > the failover transport maintains a bunch of state -
>> > > connections/sessons/producers/consumers/transactions/messags/acks so
>> that
>> > > it can replay those to maintain and recreate the jms client view.
>> > > However, a netwok bridge is not a standard jms client - specifically
>> in
>> > the
>> > > duplex case but I think there potential issues in the non duplex case
>> > also.
>> > > So a failover reconnect will not guarantee that the network bridge is
>> > fully
>> > > functional. The bridge needs to be stopped and restarted to
>> successfully
>> > > cleanup and resume.
>> > > In other words, the network bridge needs to be aware of transport
>> > failures
>> > > as they occur. The intent of the failover: transport is to hide those.
>> > >
>> > >
>> > >
>> > > On 25 September 2014 19:37, Tim Bain <tbain@alumni.duke.edu> wrote:
>> > >
>> > > > Based on the comments that you and Torsten made in the links from
my
>> > > first
>> > > > message, I had understood that for networkConnectors between
>> brokers,
>> > you
>> > > > should not allow the discovery transport to perform reconnects,
>> because
>> > > it
>> > > > was important for the network bridge to be notified of the
>> > disconnection
>> > > > and reconnection.  You said that that happens automatically for
>> static
>> > > > discovery transports (and I see the onServiceAdd() and
>> > onServiceRemove()
>> > > > methods in NetworkDiscoveryConnector that would handle those
>> events),
>> > but
>> > > > what's different about failover that makes the same
>> DiscoveryListener
>> > > > mechanism not work?
>> > > >
>> > > > On Thu, Sep 25, 2014 at 9:21 AM, Gary Tully <gary.tully@gmail.com>
>> > > wrote:
>> > > >
>> > > > > maxReconnectAttempts=0 relates to the use of failover only, where
>> you
>> > > use
>> > > > > failover to choose between a list of broker urls (typically a
pair
>> > for
>> > > > > master slave). masterSlave sets maxReconnectAttempts=0 on the
>> > > underlying
>> > > > > failover url.
>> > > > > The static discovery, which is implemented by the
>> > SimpleDiscoveryAgent
>> > > > can
>> > > > > do retries and backoff etc.
>> > > > > see:
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://github.com/apache/activemq/blob/d54e0d6ab590b6a6148a5e2629c45b95d3f40eb8/activemq-client/src/main/java/org/apache/activemq/transport/discovery/simple/SimpleDiscoveryAgent.java#L42
>> > > > >
>> > > > > The network bridge is a discovery listener, it gets told to
>> > add/remove
>> > > > > services (urls) that are discovered/retried.
>> > > > >
>> > > > >
>> > > > > On 24 September 2014 20:20, Tim Bain <tbain@alumni.duke.edu>
>> wrote:
>> > > > >
>> > > > > > Gary, Torsten, and others have said in various places that
>> > > > > broker-to-broker
>> > > > > > networkConnectors should set maxReconnectAttempts=0 to allow
>> > > > reconnection
>> > > > > > to be handled by the network bridge.  (Sources: 1
>> > > > > > <
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://tmielke.blogspot.com/2011/09/activemq-network-bridge-to-masterslave.html
>> > > > > > >,
>> > > > > > 2
>> > > > > > <
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://activemq.2283324.n4.nabble.com/Persistent-messages-disappearing-td4681353.html
>> > > > > > >,
>> > > > > > 3
>> > > > > > <
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://grokbase.com/t/activemq/users/1427v9eqkf/prioritybackup-not-supported-with-masterslave
>> > > > > > >)
>> > > > > > Torsten (link 1) was talking about static: network connectors,
>> > while
>> > > > > Gary's
>> > > > > > quotes in the other two links were related to failover:
(or
>> > > > masterslave:,
>> > > > > > which is just chrome on top of failover:), but if it's a
>> > requirement
>> > > of
>> > > > > the
>> > > > > > network bridge that it be the one to re-establish the question,
>> it
>> > > > > > shouldn't matter what the underlying transport is.
>> > > > > >
>> > > > > > It's obvious in FailoverTransport
>> > > > > > <
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.activemq/activemq-client/5.10.0/org/apache/activemq/transport/failover/FailoverTransport.java#FailoverTransport
>> > > > > > >
>> > > > > > how maxReconnectAttempts=0 gets processed to mean "don't
try to
>> > > > > reconnect",
>> > > > > > allowing the network bridge to re-establish the connection,
and
>> > there
>> > > > are
>> > > > > > notes in
>> > > http://activemq.apache.org/failover-transport-reference.html
>> > > > > > explaining that this interpretation of the value "0" was
>> > implemented
>> > > in
>> > > > > > 5.6.0 (https://issues.apache.org/jira/browse/AMQ-3542).
>> There's
>> > no
>> > > > > > similar
>> > > > > > code in SimpleDiscoveryAgent
>> > > > > > <
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.activemq/activemq-all/5.10.0/org/apache/activemq/transport/discovery/simple/SimpleDiscoveryAgent.java#SimpleDiscoveryAgent
>> > > > > > >
>> > > > > > (which handles connection attempts for the static: transport
>> > > > > > <http://activemq.apache.org/static-transport-reference.html>,
>> as I
>> > > > > > understand it) to interpret "-1" as "reconnect forever"
and "0"
>> as
>> > > > "don't
>> > > > > > reconnect".
>> > > > > >
>> > > > > > Is Gary's and Torsten's advice about maxReconnectAttempts
not
>> > > > applicable
>> > > > > to
>> > > > > > static: network connectors for some reason that I'm not
>> > > understanding?
>> > > > > Or
>> > > > > > should the changes Gary made in AMQ-3542 have been applied
to
>> all
>> > > > > protocols
>> > > > > > that include reconnection attempts?  (Do I need to open
a JIRA
>> for
>> > > > this?)
>> > > > > >
>> > > > > > And a related question: when using the static: transport
to
>> > > establish a
>> > > > > > broker mesh, if we set maxReconnectAttempts=0, is there
a way to
>> > > > perform
>> > > > > > exponential backoff at the network bridge, so it doesn't
>> > continually
>> > > > try
>> > > > > to
>> > > > > > reconnect (and spam the logs) when one broker in the mesh
is
>> > offline
>> > > > for
>> > > > > a
>> > > > > > while?  The only way I see to control exponential backoff
is
>> within
>> > > the
>> > > > > > static: transport via the useExponentialBackOff=true option;
>> > > searching
>> > > > > the
>> > > > > > source code (I'm looking at 5.8.0), I don't see any references
>> to
>> > > > > > exponential backoff in any code that seems to be related
to
>> network
>> > > > > > bridges...
>> > > > > >
>> > > > > > Thanks,
>> > > > > > Tim
>> > > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > http://redhat.com
>> > > > > http://blog.garytully.com
>> > > > >
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > http://redhat.com
>> > > http://blog.garytully.com
>> > >
>> >
>>
>>
>>
>> --
>> http://redhat.com
>> http://blog.garytully.com
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message