activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Bain <tb...@alumni.duke.edu>
Subject Re: ActiveMQ master-slave topology issue[BUG]
Date Fri, 12 Jun 2015 21:47:18 GMT
Stephan, can you describe which NFS settings resulted in which behavior?
On Jun 12, 2015 8:34 AM, "Stephan Burkard" <sburkard@gmail.com> wrote:

> Anuj
>
> Have a look at https://issues.apache.org/jira/browse/AMQ-5549
>
> You cannot avoid to have either two or zero master-brokers online during a
> failover. The question is how long this situation lasts (see Arthur
> Naseef's comment on AMQ-5549).
>
> In my failover-tests with NFS shared storage I was able to reproduce very
> different scenarios:
> - the former master broker *never* shuts down
> - the former master broker shuts down after 15 minutes
> - the former master broker shuts down after 20 seconds
>
> The only difference between these scenarios were NFS settings. My overall
> impression is that the failover only works with a highly available shared
> storage. As soon as one or multiple brokers lose the NFS connection, the
> situation is getting crazy and I even "managed" it to corrupt the
> persistence store during my tests.
>
> Also notice the both-brokers-down-problem (
> https://issues.apache.org/jira/browse/AMQ-5568) that I discovered during
> my
> tests.
>
> Cheers
> Stephan
>
>
> On Thu, Apr 30, 2015 at 2:40 PM, Tim Bain <tbain@alumni.duke.edu> wrote:
>
> > An NFS problem was the first thing I thought of when I saw out-of-order
> log
> > lines, especially since you've had that problem before.  And this outage
> > lasted for over two minutes (which doesn't count as "slow" in my book;
> > that's "unavailable" or "down" to me), which is pretty crazy; hopefully
> > your ops team has looked into how that happened and taken steps to ensure
> > it doesn't happen again.
> >
> > A NFS outage does justify a failover to the backup broker; to understand
> > why, think about what prevents failover during normal operation.  The
> > master broker holds a file system lock on a DB lock file, and the slave
> > broker tries repeatedly to acquire the same lock.  As long as it can't,
> it
> > knows the master broker is up and it can't become the master; at the
> point
> > where the lock disappears because the master broker can't access NFS, the
> > slave becomes active (at least, if it can access NFS; if not, then it
> > doesn't know that it could become active and it can't read the messages
> > from disk anyway).  This is exactly what you would want to happen.
> >
> > The real problem here is the one in your last paragraph: when the slave
> > acquires the lock because the master can't access NFS, the master isn't
> > detecting that and becoming the slave.  I'd suggest you try to recreate
> > this failure (in a dev environment) by causing the master broker to be
> > unable to access NFS and confirming that the master remains active even
> > after the slave becomes the master.  Assuming that happens, submit a JIRA
> > bug report to describe the problem.  Make sure you provide lots of
> details
> > about your NFS setup (include version numbers, file system type, etc.)
> and
> > about the O/Ses of the machines the brokers run on, since the behavior
> > might vary based on some of those things and you want to make sure that
> > whoever investigates this can reproduce it.  But make sure you can
> > reproduce it first.
> >
> > Tim
> > Hi,
> >
> > I got the logs in this order only and after further checking the system I
> > got to know that NFS(where we put kahadb and broker logs) was slow during
> > that time.
> >
> > I can understand the delay in logs or I/O operations are slow during that
> > time but it does not justify why failover also open it's transport
> > connector.
> >
> > The main concern here is that the (master-slave-shared-storage)topology
> is
> > broken which should not happen in any case. If I/O operations are not
> > happening, master broker should stop and let the failover serve the
> clients
> > but here master didn't stop and both opened the connector.
> >
> > Thanks,
> > Anuj
> >
> >
> >
> >
> > --
> > View this message in context:
> >
> >
> http://activemq.2283324.n4.nabble.com/ActiveMQ-master-slave-topology-issue-BUG-tp4695677p4695731.html
> > Sent from the ActiveMQ - User mailing list archive at Nabble.com.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message