activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Bain <tb...@alumni.duke.edu>
Subject Re: HA in Master/Slave with shared mKahaDb not really HA because of slow failover?
Date Fri, 20 Jan 2017 14:04:47 GMT
Also, clients that can't afford to wait for the broker when it goes down
can run embedded brokers that will queue published messages without
blocking the client. But there are disadvantages to using embedded brokers,
so be sure you understand the tradeoffs before you go down that path.

Tim

On Jan 17, 2017 7:16 AM, "Christopher Shannon" <
christopher.l.shannon@gmail.com> wrote:

> Which version are you using? Normally on start up the broker should
> not need to replay the entire journal if the index already exists.  On
> startup the broker tries to determine the last in doubt position from
> the index and only replay from that point.  With HA I would expect
> this to work the same way as the shared directory contains the index
> and journal so I'm wondering if something was detected wrong with the
> index to trigger the full replay.
>
> It might help to turn on debug or trace logging to see what if there
> is some more information on why there is a full journal replay.  The
> two classes to enable logging on would be
> org.apache.activemq.store.kahadb.MessageDatabase and
> org.apache.activemq.store.kahadb.KahaDBStore
>
> On Tue, Jan 17, 2017 at 2:51 AM, Johannes F. Knauf
> <johannes.knauf@ancud.de> wrote:
> > Hi,
> >
> > I filed a bug with JIRA about HA in Master/Slave mode with shared
> mKahaDb not being really HA
> > because of extremely slow failover.  Depending on the message load
> startup time of the Slave when
> > becoming Master can be seriously slowed down (in the order of minutes)
> which yields an extremely
> > slow failover and hence a phase of unavailability of the broker.
> >
> > https://issues.apache.org/jira/browse/AMQ-6564
> >
> > Timothy Bish suggested to discuss this issue first here on the Users
> Mailing List. So I gladly repost.
> >
> > ---
> >
> > Consider the following scenario:
> > * AMQ Host A and Host B are configured exactly the same
> > * Host A and Host B share a common filesystem storage for their
> (m)kahadb in order to create HA as
> > described in http://activemq.apache.org/shared-file-system-master-
> slave.html
> > * high-traffic scenario, where at each point in time quite some amount
> of messages is still in each
> > queue
> >
> > Expected:
> > Given Host A is current master and Host B is polling for the lock every
> 10 seconds (default),
> > when Host A is going down,
> > then Host B should be able to serve producer enqueue requests after 10
> seconds + some microseconds
> > at max.
> >
> > Reality:
> > Host B needs to replay the whole journals before being available to
> accept new messages again. This
> > can take a long time, especially if consistency checks are required.
> This means Master/Slave with
> > shared FS is not really providing HA.
> >
> > It is perfectly understandable, that for consumers the failover takes
> that long. They can only
> > continue receiving messages, when all journals have been read. Otherwise
> order of messages would be
> > destroyed.
> >
> > For producers this is not the case, as AMQ could just create a fresh
> journal file and start
> > appending immediately. Am I wrong?
> >
> > Also it seems, that each kahaDB in an mKahaDB is checked in sequence, so
> that in worst case even
> > less filled queues are not available before everything is checked
> completely.
> >
> > Long unavailability for producers is unacceptable in most scenarios. It
> means that all producing
> > clients have to take a serious amount of effort to protect against these
> scenarios in order not to
> > lose messages (buffering, etc.). Or is there a best practise workaround?
> >
> >
> > ---
> >
> > Any ideas why it is like that?
> >
> > Thanks,
> > Johannes
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message