activemq-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mihkel Nõges <mihkel.no...@transferwise.com>
Subject Re: [Artemis] Master fails to start up after failback
Date Tue, 20 Oct 2015 18:28:05 GMT
Hi Justin,

Thanks for answering!

I did not describe the whole picture, just the part which caused issues. I
set up 2 physical nodes, each running 1 master and 1 slave crossed over, so
both slaves are on different node from it's master. I did not copy any
journals, but created every instance separately with artemis create command
and making necessary changes to broker.xml. I started masters first and
while looking at the logs I noticed the master I started second took the
role of slave.

BR!
Mihkel



On 20 October 2015 at 18:36, Justin Bertram <jbertram@apache.com> wrote:

> I'm not sure I understand the point of having an HA policy without any HA
> (i.e. without any backups).  If you want 2 master servers then don't
> configure HA, just configure 2 clustered servers.
>
> Also, make sure you don't copy the journal from one server to another when
> configuring replication as the journal contains the unique ID of each node.
>
>
> Justin
>
> ----- Original Message -----
> From: "Mihkel Nõges" <mihkel.noges@transferwise.com>
> To: users@activemq.apache.org
> Sent: Tuesday, October 20, 2015 9:46:24 AM
> Subject: Re: [Artemis] Master fails to start up after failback
>
> Yes, sorry, had typo in email. Had replicated conf and second master became
> slave for the first.
>
> Mihkel
>
> On 20 October 2015 at 17:44, Justin Bertram <jbertram@apache.com> wrote:
>
> > You can't have 2 masters using the same shared-store.  However, you can
> > have 2 masters each with their own store.
> >
> >
> > Justin
> >
> > ----- Original Message -----
> > From: "Mihkel Nõges" <mihkel.noges@transferwise.com>
> > To: users@activemq.apache.org
> > Sent: Tuesday, October 20, 2015 9:24:21 AM
> > Subject: Re: [Artemis] Master fails to start up after failback
> >
> > Also I had a question earlier about having more than one Artemis master
> in
> > single cluster. When I tried this it resulted in only one master
> becoming a
> > master, the other one became a slave for the first one started even
> though
> > I set different group-name values for them in broker.xml. Is this
> expected
> > behavior?
> >
> > <ha-policy>
> >    <shared-store>
> >       <master>
> >          <group-name>ha-cluster1</group-name>
> >       </master>
> >    </shared-store>
> > </ha-policy>
> >
> > <ha-policy>
> >    <shared-store>
> >       <master>
> >          <group-name>ha-cluster2</group-name>
> >       </master>
> >    </shared-store>
> > </ha-policy>
> >
> > Mihkel
> >
> > On 20 October 2015 at 16:53, Mihkel Nõges <mihkel.noges@transferwise.com
> >
> > wrote:
> >
> > > Hi Tim, Clebert!
> > >
> > > Yes we considered also the alternatives (
> > > http://activemq.apache.org/masterslave.html):
> > > *Shared Storage:*
> > >
> > > We do not have high performance shared storage solution. We have some
> > > solution for our current file storage needs, but it's I/O is said to be
> > > very slow and would need to be extended to support extra load.
> > >
> > > *Replicated LevelDB:*
> > >
> > > It sounds cool, but I'm a little bit afraid of moving from one
> > > experimental solution to the next. I noticed LevelDB does not support
> > some
> > > of the features we need like Scheduled message delivery:
> > > http://activemq.apache.org/replicated-leveldb-store.html
> > > The LevelDB store does not yet support storing data associated with
> Delay
> > > and Schedule Message Delivery. Those are are stored in a separate
> > > non-replicated KahaDB data files. Unexpected results will occur if you
> > use
> > > Delay and Schedule Message Delivery with the replicated leveldb store
> > since
> > > that data will be not be there when the master fails over to a slave.
> > >
> > > Note like this make me feel very uneasy about the solution.
> > >
> > > *JDBC:*
> > >
> > > So it seems to me like the most reliable highly available messaging
> > > solution in ActiveMQ 5 is JDBC. We have MySQL running as our main DB
> and
> > > setting up a second DB for messaging would be fairly simple for
> standard
> > > procedures of maintenance, backups and disaster recovery etc.
> > >
> > >
> > > I consider this only as a temporary solution until we can use more
> > > performant alternative configuration and I'm not expecting Artemis to
> > > implement support for JDBC storage ever.
> > >
> > > We are using messaging in process of splitting our monolithic
> application
> > > into micro-services. As this is gradual process, the amount of messages
> > > would be very small in the beginning, so having low performing but
> > reliable
> > > JDBC backed broker configuration seems good for start.
> > >
> > > I was trying to find the more orthodox approach, but could not find or
> > get
> > > good suggestions. I tried disabling fail-back and starting master like
> > that
> > > resulted in both servers spamming in the logs another server with the
> > same
> > > ID is running. Do I understand correctly I should have backed up and
> > > removed the /data folder of the master, reconfigured it as a slave and
> > > started it then?
> > >
> > > Can you give me some overview of already existing deployments of highly
> > > available and failing over (not necessarily failing back) Artemis
> > > installations in production I may change my mind about going with it
> from
> > > the start.
> > >
> > > Mihkel
> > >
> > >
> > > On 20 October 2015 at 16:19, Clebert Suconic <
> clebert.suconic@gmail.com>
> > > wrote:
> > >
> > >> As far as I know ActiveMQ5 doesn't do failback on the master-slave
> > >> journal... and it doesn't have any protocol to sync the data between
> > >> master and slave.
> > >>
> > >>
> > >> There is a small regression on the failback that we are dealing now...
> > >> if you set the master as a backup it would work fine...
> > >>
> > >>
> > >> I think your testcase is a bit non orthodox...
> > >>
> > >> TBH production guys usually don't use failback.. they keep the backup
> > >> until they can get to a quiet period and then do the failback (or
> > >> restart the system) under low load.
> > >>
> > >>
> > >> I also second Tim Bain on  your choice for JDBC.
> > >>
> > >> I actually always say this.. if you can use JDBC as a storage for
> > >> messaging.. don't use messaging at all.. just store and retrieve from
> > >> the Database.
> > >>
> > >>
> > >> There's a JIRA open for Artemis on JDBC.. but usually those things are
> > >> written because users want, not need it.
> > >>
> > >> On Tue, Oct 20, 2015 at 3:12 AM, Mihkel Nõges
> > >> <mihkel.noges@transferwise.com> wrote:
> > >> > Yes I saw that issue too and set myself as watcher of this when it
> was
> > >> > created. I did not think it could be exactly the same as it is
> > >> described to
> > >> > present itself only in narrow timing related conditions. My case
> seems
> > >> to
> > >> > be much more broad and basic. Seems like nobody actually tried to
> set
> > >> this
> > >> > up in realistic situation.
> > >> >
> > >> > Do you know of any existing production deployments of Artemis (or
> > >> hornetq)
> > >> > with failover? I thought Artemis as based on hornetq  should have
> it's
> > >> > features as stable as last hornetq version. We have already used
> > >> embedded
> > >> > hornetq for some time happily. I think it would make a lot of sense
> to
> > >> > grade the Artemis features publicly as what is their maturity and
> > usage
> > >> > statistics of each feature if known, so it would be easier to
> compare
> > >> the
> > >> > brokers even among the 3 variants of ActiveMQ family.
> > >> >
> > >> > I think it's more safe for us to start building our first messaging
> > >> > features on ActiveMQ 5.12.1 with JDBC backed Master-Slave instead of
> > >> > Artemis and switch to Artemis once it has become more stable and
> also
> > >> our
> > >> > needs for scalability have grown to make it reasonable. Right now it
> > >> seems
> > >> > there are still too big blockers which may threaten the stability of
> > our
> > >> > system in Artemis.
> > >> >
> > >> > I did not mean this letter to be in no means negative. In the
> > opposite I
> > >> > really hope Artemis would do well as it comes with such a great
> > >> technical
> > >> > foundation and elegant ideas. I think the best for Artemis would be
> to
> > >> find
> > >> > users that can trust it's features and improve it as they grow. This
> > >> means
> > >> > the nucleus of Artemis must be really solid and stable.
> > >> >
> > >> > BR!
> > >> > Mihkel Nõges
> > >> >
> > >> >
> > >> >
> > >> > On 19 October 2015 at 22:15, Clebert Suconic <
> > clebert.suconic@gmail.com
> > >> >
> > >> > wrote:
> > >> >
> > >> >> Looks related to me:
> > >> >>
> > >> >> https://issues.apache.org/jira/browse/ARTEMIS-256
> > >> >>
> > >> >>
> > >> >>
> > >> >> On Mon, Oct 19, 2015 at 4:04 AM, Mihkel Nõges
> > >> >> <mihkel.noges@transferwise.com> wrote:
> > >> >> > Basic flow of getting unresponsive failback cluster:
> > >> >> > Have machine with Ubuntu 14.04.3
> > >> >> >
> > >> >> >    1. Install libaio1, Java 1.8.0_60, maven 3.3.3, download and
> > >> extract
> > >> >> >    apache-artemis-1.1.0-bin
> > >> >> >    <
> > >> >>
> > >>
> >
> http://www.eu.apache.org/dist/activemq/activemq-artemis/1.1.0/apache-artemis-1.1.0-bin.tar.gz
> > >> >> >
> > >> >> > in
> > >> >> >    /opt
> > >> >> >    2. run $ mvn -Prelease install and $ mnv verify in
> > >> >> >
> > >>  /opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback
> > >> >> >    SUCCESS
> > >> >> >    3. Clean data folders and starts both servers manually:
> > >> >> >    $
> > >> >> >    cd
> > >> >>
> > >>
> > /opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target
> > >> >> >    $ rm -R server0/data/
> > >> >> >    $ rm -R server1/data/
> > >> >> >    $ server0/bin/artemis-service start
> > >> >> >    Starting artemis-service
> > >> >> >    artemis-service is now running (7154)
> > >> >> >    $ server1/bin/artemis-service start
> > >> >> >    Starting artemis-service
> > >> >> >    artemis-service is now running (7180)
> > >> >> >    4. Kill master server and wait for slave to take over
> > >> >> >    $ kill -9 7154
> > >> >> >
> > >> >> >    $ tail -f server1/log/artemis.log
> > >> >> >    08:52:54,798 INFO  [org.apache.activemq.artemis.core.server]
> > >> >> AMQ221043:
> > >> >> >    Protocol module found: [artemis-stomp-protocol]. Adding
> protocol
> > >> >> support
> > >> >> >    for: STOMP
> > >> >> >    08:53:02,145 INFO  [org.apache.activemq.artemis.core.server]
> > >> >> AMQ221109:
> > >> >> >    Apache ActiveMQ Artemis Backup Server version 1.1.0 [null]
> > >> started,
> > >> >> waiting
> > >> >> >    live to fail before it gets active
> > >> >> >    08:53:03,582 INFO  [org.apache.activemq.artemis.core.server]
> > >> >> AMQ221024:
> > >> >> >    Backup server
> > >> >> >
> > >> ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 is
> > >> >> >    synchronized with live-server.
> > >> >> >    08:53:03,777 INFO  [org.apache.activemq.artemis.core.server]
> > >> >> AMQ221031:
> > >> >> >    backup announced
> > >> >> >    08:55:59,292 INFO  [org.apache.activemq.artemis.core.server]
> > >> >> AMQ221037:
> > >> >> >
> > >> ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0 to
> > >> >> >    become 'live'
> > >> >> >    08:55:59,302 WARN  [org.apache.activemq.artemis.core.client]
> > >> >> AMQ212004:
> > >> >> >    Failed to connect to server.
> > >> >> >    08:55:59,778 INFO  [org.apache.activemq.artemis.core.server]
> > >> >> AMQ221003:
> > >> >> >    trying to deploy queue jms.queue.exampleQueue
> > >> >> >    08:55:59,829 WARN  [org.apache.activemq.artemis.core.client]
> > >> >> AMQ212034:
> > >> >> >    There are more than one servers on the network broadcasting
> the
> > >> same
> > >> >> node
> > >> >> >    id. You will see this message exactly once (per node) if a
> node
> > is
> > >> >> >    restarted, in which case it can be safely ignored. But if it
> is
> > >> logged
> > >> >> >    continuously it means you really do have more than one node on
> > the
> > >> >> same
> > >> >> >    network active concurrently with the same node id. This could
> > >> occur
> > >> >> if you
> > >> >> >    have a backup node active at the same time as its live node.
> > >> >> >    nodeID=64ddff0f-7636-11e5-bfa8-f5004e6195f0
> > >> >> >    08:55:59,836 INFO  [org.apache.activemq.artemis.core.server]
> > >> >> AMQ221007:
> > >> >> >    Server is now live
> > >> >> >    08:55:59,869 INFO  [org.apache.activemq.artemis.core.server]
> > >> >> AMQ221020:
> > >> >> >    Started Acceptor at broker3:61617 for protocols
> > >> >> >    [CORE,MQTT,AMQP,HORNETQ,STOMP,OPENWIRE]
> > >> >> >    5.
> > >> >> >
> > >> >> >    Start master again and observer the logs:
> > >> >> >    $ server0/bin/artemis-service start
> > >> >> >    Starting artemis-service
> > >> >> >    artemis-service is now running (7388)
> > >> >> >
> > >> >> > $ tail -f server0/log/artemis.log
> > >> >> > 08:57:24,625 INFO  [org.apache.activemq.artemis.core.server]
> > >> AMQ221012:
> > >> >> > Using AIO Journal
> > >> >> > 08:57:24,694 INFO  [org.apache.activemq.artemis.core.server]
> > >> AMQ221043:
> > >> >> > Protocol module found: [artemis-server]. Adding protocol support
> > for:
> > >> >> CORE
> > >> >> > 08:57:24,702 INFO  [org.apache.activemq.artemis.core.server]
> > >> AMQ221043:
> > >> >> > Protocol module found: [artemis-amqp-protocol]. Adding protocol
> > >> support
> > >> >> > for: AMQP
> > >> >> > 08:57:24,731 INFO  [org.apache.activemq.artemis.core.server]
> > >> AMQ221043:
> > >> >> > Protocol module found: [artemis-hornetq-protocol]. Adding
> protocol
> > >> >> support
> > >> >> > for: HORNETQ
> > >> >> > 08:57:24,733 INFO  [org.apache.activemq.artemis.core.server]
> > >> AMQ221043:
> > >> >> > Protocol module found: [artemis-mqtt-protocol]. Adding protocol
> > >> support
> > >> >> > for: MQTT
> > >> >> > 08:57:24,743 INFO  [org.apache.activemq.artemis.core.server]
> > >> AMQ221043:
> > >> >> > Protocol module found: [artemis-openwire-protocol]. Adding
> protocol
> > >> >> support
> > >> >> > for: OPENWIRE
> > >> >> > 08:57:24,878 INFO  [org.apache.activemq.artemis.core.server]
> > >> AMQ221043:
> > >> >> > Protocol module found: [artemis-stomp-protocol]. Adding protocol
> > >> support
> > >> >> > for: STOMP
> > >> >> > 08:57:25,082 INFO  [org.apache.activemq.artemis.core.server]
> > >> AMQ221109:
> > >> >> > Apache ActiveMQ Artemis Backup Server version 1.1.0 [null]
> started,
> > >> >> waiting
> > >> >> > live to fail before it gets active
> > >> >> > 08:57:27,043 INFO  [org.apache.activemq.artemis.core.server]
> > >> AMQ221024:
> > >> >> > Backup server
> > >> >> >
> ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0
> > >> is
> > >> >> > synchronized with live-server.
> > >> >> > 08:57:27,948 INFO  [org.apache.activemq.artemis.core.server]
> > >> AMQ221031:
> > >> >> > backup announced
> > >> >> > 08:57:31,227 WARN  [org.apache.activemq.artemis.core.client]
> > >> AMQ212037:
> > >> >> > Connection failure has been detected: AMQ119015: The connection
> was
> > >> >> > disconnected because of server shutdown [code=DISCONNECTED]
> > >> >> > 08:57:31,252 WARN  [org.apache.activemq.artemis.core.client]
> > >> AMQ212037:
> > >> >> > Connection failure has been detected: AMQ119015: The connection
> was
> > >> >> > disconnected because of server shutdown [code=DISCONNECTED]
> > >> >> > 08:57:31,307 WARN  [org.apache.activemq.artemis.core.client]
> > >> AMQ212037:
> > >> >> > Connection failure has been detected: AMQ119015: The connection
> was
> > >> >> > disconnected because of server shutdown [code=DISCONNECTED]
> > >> >> > 08:57:31,339 INFO  [org.apache.activemq.artemis.core.server]
> > >> AMQ221037:
> > >> >> >
> ActiveMQServerImpl::serverUUID=64ddff0f-7636-11e5-bfa8-f5004e6195f0
> > >> to
> > >> >> > become 'live'
> > >> >> > 08:57:31,360 WARN  [org.apache.activemq.artemis.core.client]
> > >> AMQ212004:
> > >> >> > Failed to connect to server.
> > >> >> > 08:57:31,413 ERROR [org.apache.activemq.artemis.core.server]
> > >> AMQ224008:
> > >> >> > Failed to store id: java.lang.IllegalStateException: Cannot find
> > add
> > >> >> info 1
> > >> >> > at
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:799)
> > >> >> > [artemis-journal-1.1.0.jar:1.1.0]
> > >> >> > at
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.journal.impl.JournalBase.appendDeleteRecord(JournalBase.java:183)
> > >> >> > [artemis-journal-1.1.0.jar:1.1.0]
> > >> >> > at
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:79)
> > >> >> > [artemis-journal-1.1.0.jar:1.1.0]
> > >> >> > at
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.deleteID(JournalStorageManager.java:1194)
> > >> >> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> > at
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.deleteID(BatchingIDGenerator.java:152)
> > >> >> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> > at
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.cleanup(BatchingIDGenerator.java:75)
> > >> >> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> > at
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.loadBindingJournal(JournalStorageManager.java:
> > >> 1784)
> > >> >> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> > at
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java:
> > >> 1625)
> > >> >> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> > at
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java:
> > >> 1535)
> > >> >> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> > at
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249)
> > >> >> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60]
> > >> >> > 08:57:31,540 WARN  [org.apache.activemq.artemis.core.server]
> > >> AMQ222173:
> > >> >> > Queue jms.queue.exampleQueue is duplicated during reload. This
> > queue
> > >> will
> > >> >> > be renamed as jms.queue.exampleQueue-0
> > >> >> > 08:57:31,550 ERROR [org.apache.activemq.artemis.core.server]
> > >> AMQ224000:
> > >> >> > Failure in initialisation: java.lang.IllegalStateException:
> Cursor
> > 2
> > >> had
> > >> >> > already been created
> > >> >> > at
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.createSubscription(PageCursorProviderImpl.java:97)
> > >> >> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> > at
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.server.impl.PostOfficeJournalLoader.initQueues(PostOfficeJournalLoader.java:140)
> > >> >> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> > at
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java:
> > >> 1631)
> > >> >> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> > at
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java:
> > >> 1535)
> > >> >> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> > at
> > >> >> >
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249)
> > >> >> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60]
> > >> >> >
> > >> >> >
> > >> >> > On 19 October 2015 at 10:31, Mihkel Nõges <
> > >> mihkel.noges@transferwise.com
> > >> >> >
> > >> >> > wrote:
> > >> >> >
> > >> >> >> Hi Clebert,
> > >> >> >>
> > >> >> >> I do not have other code to share with you but the example code
> in
> > >> >> Artemis
> > >> >> >> 1.1.0 binary deployment package. I'm running
> > >> >> >>
> org.apache.activemq.artemis.jms.example.ReplicatedFailbackExample
> > >> >> >>
> > >> >> >> And only commented out the serverStart and killServer calls
> which
> > I
> > >> am
> > >> >> >> doing manually.
> > >> >> >>
> > >> >> >> I do not think I do any of the steps too fast as I tail the
> server
> > >> log
> > >> >> >> files in parallel and see everything is finished when I start
> the
> > >> fail
> > >> >> >> back. I have waited many minutes in between.
> > >> >> >>
> > >> >> >> Only changes in configuration to the test is changing localhost
> > >> >> addresses
> > >> >> >> with broker3 to make the cluster accessible remotely.
> > >> >> >>
> > >> >> >> BR!
> > >> >> >> MIhkel
> > >> >> >>
> > >> >> >> On 18 October 2015 at 17:49, Clebert <clebert.suconic@gmail.com
> >
> > >> wrote:
> > >> >> >>
> > >> >> >>> Im not on my computer now but it sounds like you are doing a
> fail
> > >> back
> > >> >> >>> immediately after failed over. It takes some time (seconds) to
> > the
> > >> >> server
> > >> >> >>> to activate on the backup.
> > >> >> >>>
> > >> >> >>> Later the server will need to copy the data back before it can
> be
> > >> >> >>> activated in fail back mode.
> > >> >> >>>
> > >> >> >>> It sounds the live is not reaching backup for fail back.
> > >> >> >>>
> > >> >> >>> I will try looking it at it on Monday. Maybe you could post
> your
> > >> >> example
> > >> >> >>> at your GitHub fork.
> > >> >> >>>
> > >> >> >>> -- Clebert Suconic typing on the iPhone.
> > >> >> >>>
> > >> >> >>> > On Oct 18, 2015, at 08:15, Mihkel Nõges <
> > >> >> mihkel.noges@transferwise.com>
> > >> >> >>> wrote:
> > >> >> >>> >
> > >> >> >>> > Hello again!
> > >> >> >>> >
> > >> >> >>> > I would be very grateful If someone could answer my
> questions.
> > We
> > >> >> need
> > >> >> >>> the high availability to work to use the broker in production.
> > >> >> >>> >
> > >> >> >>> > When I run the replicated-failback example in one machine
> > >> (broker3)
> > >> >> it
> > >> >> >>> succeeds.
> > >> >> >>> >
> > >> >> >>> > It fails when I run the same test - exactly the same servers
> > with
> > >> >> >>> slightly modified client remotely.
> > >> >> >>> >
> > >> >> >>> > I run client in debug mode from my IDE with commented out
> > >> serverStart
> > >> >> >>> and killServer calls.
> > >> >> >>> > Deleted data folders and started the servers:
> > >> >> >>> > artemis@broker3
> > >> >>
> > >>
> >
> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$
> > >> >> >>> rm -R server0/data/
> > >> >> >>> >
> > >> >> >>> > artemis@broker3
> > >> >>
> > >>
> >
> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$
> > >> >> >>> rm -R server1/data/
> > >> >> >>> >
> > >> >> >>> > artemis@broker3
> > >> >>
> > >>
> >
> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$
> > >> >> >>> server0/bin/artemis-service start
> > >> >> >>> >
> > >> >> >>> > Starting artemis-service
> > >> >> >>> >
> > >> >> >>> > artemis-service is now running (23357)
> > >> >> >>> >
> > >> >> >>> > artemis@broker3
> > >> >>
> > >>
> >
> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$
> > >> >> >>> server1/bin/artemis-service start
> > >> >> >>> >
> > >> >> >>> > Starting artemis-service
> > >> >> >>> >
> > >> >> >>> > artemis-service is now running (23383)
> > >> >> >>> >
> > >> >> >>> > Starting client and stopping on breakpoint at line 103:
> > >> >> >>> > //ServerUtil.killServer(server0);
> > >> >> >>> > // Step 11. Acknowledging the 2nd half of the sent messages
> > will
> > >> fail
> > >> >> >>> as failover to the
> > >> >> >>> > // backup server has occurred
> > >> >> >>> > try {
> > >> >> >>> >    message0.acknowledge();  //line 103
> > >> >> >>> > killing server0
> > >> >> >>> > artemis@broker3
> > >> >>
> > >>
> >
> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$
> > >> >> >>> kill -9 23357
> > >> >> >>> >
> > >> >> >>> > Proceeding to breakpoint at line 121:
> > >> >> >>> > //server0 = ServerUtil.startServer(args[0],
> > >> >> >>> ReplicatedFailbackExample.class.getSimpleName() + "0", 0,
> 10000);
> > >> >> >>> >
> > >> >> >>> > // Step 11. Acknowledging the 2nd half of the sent messages
> > will
> > >> fail
> > >> >> >>> as failover to the
> > >> >> >>> > // backup server has occurred
> > >> >> >>> > try {
> > >> >> >>> >    message0.acknowledge(); // line 121
> > >> >> >>> > Starting server0:
> > >> >> >>> > artemis@broker3
> > >> >>
> > >>
> >
> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/target$
> > >> >> >>> server0/bin/artemis-service start
> > >> >> >>> >
> > >> >> >>> > Starting artemis-service
> > >> >> >>> >
> > >> >> >>> > artemis-service is now running (24240)
> > >> >> >>> >
> > >> >> >>> > Server0 writes ERROR to it's log (see attached
> > >> server0_artemis.log).
> > >> >> >>> > Now when trying to proceed with the client it writes the
> > >> following in
> > >> >> >>> the log and does not exit, but remains hanging forever:
> > >> >> >>> >
> > >> >> >>> > Oct 18, 2015 2:55:34 PM
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl
> > >> >> >>> fail
> > >> >> >>> >
> > >> >> >>> > WARN: AMQ212037: Connection failure has been detected:
> > >> AMQ119015: The
> > >> >> >>> connection was disconnected because of server shutdown
> > >> >> [code=DISCONNECTED]
> > >> >> >>> >
> > >> >> >>> > Got message: This is text message 20 (redelivered?: false)
> > >> >> >>> >
> > >> >> >>> > Got exception while acknowledging message: AMQ119014: Timed
> out
> > >> after
> > >> >> >>> waiting 30,000 ms for response when sending packet 43
> > >> >> >>> >
> > >> >> >>> > Got message: This is text message 21 (redelivered?: false)
> > >> >> >>> >
> > >> >> >>> > Got message: This is text message 22 (redelivered?: false)
> > >> >> >>> >
> > >> >> >>> > Got message: This is text message 23 (redelivered?: false)
> > >> >> >>> >
> > >> >> >>> > Got message: This is text message 24 (redelivered?: false)
> > >> >> >>> >
> > >> >> >>> > Got message: This is text message 25 (redelivered?: false)
> > >> >> >>> >
> > >> >> >>> > Got message: This is text message 26 (redelivered?: false)
> > >> >> >>> >
> > >> >> >>> > Got message: This is text message 27 (redelivered?: false)
> > >> >> >>> >
> > >> >> >>> > Got message: This is text message 28 (redelivered?: false)
> > >> >> >>> >
> > >> >> >>> > Got message: This is text message 29 (redelivered?: false)
> > >> >> >>> >
> > >> >> >>> > As a result the slave (server1) remains stopped, not
> restarted
> > as
> > >> >> >>> expected and the master (server0) process appears to be running
> > but
> > >> >> does
> > >> >> >>> not accept any connections.
> > >> >> >>> >
> > >> >> >>> > Exactly the same behavior is observable every time I try
> this.
> > >> >> >>> >
> > >> >> >>> > BR!
> > >> >> >>> > Mihkel
> > >> >> >>> >
> > >> >> >>> >> On 13 October 2015 at 20:17, Mihkel Nõges <
> > >> >> >>> mihkel.noges@transferwise.com> wrote:
> > >> >> >>> >> Hi Clebert,
> > >> >> >>> >>
> > >> >> >>> >> No test, just doing it on command line with standalone
> > servers.
> > >> I'm
> > >> >> >>> using 1.1.0 installed with wget, not the snapshot.
> > >> >> >>> >>
> > >> >> >>> >> I'm wondering what should be the suggested procedure for
> > admins
> > >> to
> > >> >> do
> > >> >> >>> changes to HA cluster of 2 or 3 nodes of Artemis. If one of the
> > >> nodes
> > >> >> is
> > >> >> >>> master by configuration, do they need to change it's config
> > before
> > >> >> >>> restarting it to become slave to have seamless change process
> and
> > >> make
> > >> >> some
> > >> >> >>> instance master by configuration only if all the instances need
> > to
> > >> be
> > >> >> >>> restarted?
> > >> >> >>> >>
> > >> >> >>> >> I tried also a cluster with 2 masters and 2 slaves with 2
> > >> separate
> > >> >> >>> group-name values, but for some reason the second master I
> > started
> > >> >> became
> > >> >> >>> slave for the first immediately. I expected it to become a
> > >> clustered
> > >> >> >>> load-balancing parallel master. Our loads are not yet that high
> > to
> > >> >> require
> > >> >> >>> more than one master, so it's just a theoretical question.
> > >> >> >>> >>
> > >> >> >>> >> BR!
> > >> >> >>> >> Mihkel
> > >> >> >>> >>
> > >> >> >>> >>> On 13 October 2015 at 20:03, Clebert Suconic <
> > >> >> >>> clebert.suconic@gmail.com> wrote:
> > >> >> >>> >>> The master needs to copy its data from the backup back to
> > live
> > >> >> before
> > >> >> >>> >>> it's activated.
> > >> >> >>> >>>
> > >> >> >>> >>> Do you have a test replicating this?
> > >> >> >>> >>>
> > >> >> >>> >>> Did you try the snapshot build?
> > >> >> >>> >>>
> > >> >> >>> >>> On Tue, Oct 13, 2015 at 11:58 AM, Mihkel Nõges
> > >> >> >>> >>> <mihkel.noges@transferwise.com> wrote:
> > >> >> >>> >>> > Hi,
> > >> >> >>> >>> >
> > >> >> >>> >>> > I configured replicating HA master-slave of Artemis 1.1.0
> > >> >> instances
> > >> >> >>> on
> > >> >> >>> >>> > Ubuntu 14.04.3.
> > >> >> >>> >>> >
> > >> >> >>> >>> > When I kill master the slave takes over as expected and
> > >> starts
> > >> >> >>> serving as
> > >> >> >>> >>> > new master. When I then start the old master, it fails
> with
> > >> the
> > >> >> >>> following
> > >> >> >>> >>> > errors in the log:
> > >> >> >>> >>> >
> > >> >> >>> >>> > 16:35:46,476 ERROR
> > [org.apache.activemq.artemis.core.server]
> > >> >> >>> AMQ224008:
> > >> >> >>> >>> > Failed to store id: java.lang.IllegalStateException:
> Cannot
> > >> find
> > >> >> >>> add info 1
> > >> >> >>> >>> > at
> > >> >> >>> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:799)
> > >> >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0]
> > >> >> >>> >>> > at
> > >> >> >>> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.journal.impl.JournalBase.appendDeleteRecord(JournalBase.java:183)
> > >> >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0]
> > >> >> >>> >>> > at
> > >> >> >>> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:79)
> > >> >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0]
> > >> >> >>> >>> > at
> > >> >> >>> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.deleteID(JournalStorageManager.java:1194)
> > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> >>> >>> > at
> > >> >> >>> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.deleteID(BatchingIDGenerator.java:152)
> > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> >>> >>> > at
> > >> >> >>> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGenerator.cleanup(BatchingIDGenerator.java:75)
> > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> >>> >>> > at
> > >> >> >>> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.loadBindingJournal(JournalStorageManager.java:
> > >> >> >>> 1784)
> > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> >>> >>> > at
> > >> >> >>> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java:
> > >> >> >>> 1625)
> > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> >>> >>> > at
> > >> >> >>> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java:
> > >> >> >>> 1535)
> > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> >>> >>> > at
> > >> >> >>> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249)
> > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> >>> >>> > at java.lang.Thread.run(Thread.java:745)
> [rt.jar:1.8.0_60]
> > >> >> >>> >>> >
> > >> >> >>> >>> > 16:35:46,572 WARN
> > [org.apache.activemq.artemis.core.server]
> > >> >> >>> AMQ222173:
> > >> >> >>> >>> > Queue jms.queue.DLQ is duplicated during reload. This
> queue
> > >> will
> > >> >> be
> > >> >> >>> renamed
> > >> >> >>> >>> > as jms.queue.DLQ-0
> > >> >> >>> >>> > 16:35:46,572 ERROR
> > [org.apache.activemq.artemis.core.server]
> > >> >> >>> AMQ224000:
> > >> >> >>> >>> > Failure in initialisation:
> java.lang.IllegalStateException:
> > >> >> Cursor
> > >> >> >>> 2 had
> > >> >> >>> >>> > already been created
> > >> >> >>> >>> > at
> > >> >> >>> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.createSubscription(PageCursorProviderImpl.java:97)
> > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> >>> >>> > at
> > >> >> >>> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.server.impl.PostOfficeJournalLoader.initQueues(PostOfficeJournalLoader.java:140)
> > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> >>> >>> > at
> > >> >> >>> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJournals(ActiveMQServerImpl.java:
> > >> >> >>> 1631)
> > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> >>> >>> > at
> > >> >> >>> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initialisePart2(ActiveMQServerImpl.java:
> > >> >> >>> 1535)
> > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> >>> >>> > at
> > >> >> >>> >>> >
> > >> >> >>>
> > >> >>
> > >>
> >
> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:249)
> > >> >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0]
> > >> >> >>> >>> > at java.lang.Thread.run(Thread.java:745)
> [rt.jar:1.8.0_60]
> > >> >> >>> >>> >
> > >> >> >>> >>> > As a result both master and the slave remain unaccessible
> > >> and no
> > >> >> >>> further
> > >> >> >>> >>> > restarts solve the situation.
> > >> >> >>> >>> >
> > >> >> >>> >>> > Attached also master and slave broker.xml files.
> > >> >> >>> >>> >
> > >> >> >>> >>> > BR!
> > >> >> >>> >>> >
> > >> >> >>> >>> > Mihkel Nõges
> > >> >> >>> >>>
> > >> >> >>> >>>
> > >> >> >>> >>>
> > >> >> >>> >>> --
> > >> >> >>> >>> Clebert Suconic
> > >> >> >>> >
> > >> >> >>>
> > >> >> >>
> > >> >> >>
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Clebert Suconic
> > >> >>
> > >>
> > >>
> > >>
> > >> --
> > >> Clebert Suconic
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message