Return-Path: X-Original-To: apmail-activemq-users-archive@www.apache.org Delivered-To: apmail-activemq-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C9E6210E18 for ; Tue, 20 Oct 2015 13:19:12 +0000 (UTC) Received: (qmail 86643 invoked by uid 500); 20 Oct 2015 13:19:12 -0000 Delivered-To: apmail-activemq-users-archive@activemq.apache.org Received: (qmail 86602 invoked by uid 500); 20 Oct 2015 13:19:12 -0000 Mailing-List: contact users-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@activemq.apache.org Delivered-To: mailing list users@activemq.apache.org Received: (qmail 86590 invoked by uid 99); 20 Oct 2015 13:19:12 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Oct 2015 13:19:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 9B7761809BA for ; Tue, 20 Oct 2015 13:19:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.82 X-Spam-Level: X-Spam-Status: No, score=-0.82 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id k2PxjI-jX7vN for ; Tue, 20 Oct 2015 13:19:06 +0000 (UTC) Received: from mail-vk0-f50.google.com (mail-vk0-f50.google.com [209.85.213.50]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 55DBE20FF0 for ; Tue, 20 Oct 2015 13:19:06 +0000 (UTC) Received: by vkgy127 with SMTP id y127so9601503vkg.0 for ; Tue, 20 Oct 2015 06:19:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=Jj46FJ2HlbO78E5hRY3CXbsjEaCZfHUyLHlj4KcwC4k=; b=HBboBplclj7WvOqa2UHVQubiQ0k6gIIZdPS83OtZJTHCbN6+OTx1S5+puTzFZHnpRB aruYqMQGnzRYGelZLALUCbY4+ZlYu8/ROa6mGjIHL9DBd5doeKpOh9ADuDQCK5FdyL8I WV8rlIzH5nWxg0x51wGuXa7nKj74kuZgm0VV7wwfewetafO32YONTHtA1QXP6ZDxrG5C AdESjeSr78775oa2sND+Ls/LFgO6QYcECKzU5QxklmZ9+agvkK0UwnMMO6jN5iYGtwkL ACmpKqx+JA5z+FvrERdeKcdSYnCRllznIva6dpKnJH9B2smmGAu9JR/pRKqbKKl8bsk+ N0Aw== MIME-Version: 1.0 X-Received: by 10.31.49.197 with SMTP id x188mr1938519vkx.94.1445347145213; Tue, 20 Oct 2015 06:19:05 -0700 (PDT) Received: by 10.31.77.196 with HTTP; Tue, 20 Oct 2015 06:19:05 -0700 (PDT) In-Reply-To: References: <4AFE4A9D-A486-4EC2-9EAD-7F71D62F1A47@gmail.com> Date: Tue, 20 Oct 2015 09:19:05 -0400 Message-ID: Subject: Re: [Artemis] Master fails to start up after failback From: Clebert Suconic To: users@activemq.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable As far as I know ActiveMQ5 doesn't do failback on the master-slave journal... and it doesn't have any protocol to sync the data between master and slave. There is a small regression on the failback that we are dealing now... if you set the master as a backup it would work fine... I think your testcase is a bit non orthodox... TBH production guys usually don't use failback.. they keep the backup until they can get to a quiet period and then do the failback (or restart the system) under low load. I also second Tim Bain on your choice for JDBC. I actually always say this.. if you can use JDBC as a storage for messaging.. don't use messaging at all.. just store and retrieve from the Database. There's a JIRA open for Artemis on JDBC.. but usually those things are written because users want, not need it. On Tue, Oct 20, 2015 at 3:12 AM, Mihkel N=C3=B5ges wrote: > Yes I saw that issue too and set myself as watcher of this when it was > created. I did not think it could be exactly the same as it is described = to > present itself only in narrow timing related conditions. My case seems to > be much more broad and basic. Seems like nobody actually tried to set thi= s > up in realistic situation. > > Do you know of any existing production deployments of Artemis (or hornetq= ) > with failover? I thought Artemis as based on hornetq should have it's > features as stable as last hornetq version. We have already used embedded > hornetq for some time happily. I think it would make a lot of sense to > grade the Artemis features publicly as what is their maturity and usage > statistics of each feature if known, so it would be easier to compare the > brokers even among the 3 variants of ActiveMQ family. > > I think it's more safe for us to start building our first messaging > features on ActiveMQ 5.12.1 with JDBC backed Master-Slave instead of > Artemis and switch to Artemis once it has become more stable and also our > needs for scalability have grown to make it reasonable. Right now it seem= s > there are still too big blockers which may threaten the stability of our > system in Artemis. > > I did not mean this letter to be in no means negative. In the opposite I > really hope Artemis would do well as it comes with such a great technical > foundation and elegant ideas. I think the best for Artemis would be to fi= nd > users that can trust it's features and improve it as they grow. This mean= s > the nucleus of Artemis must be really solid and stable. > > BR! > Mihkel N=C3=B5ges > > > > On 19 October 2015 at 22:15, Clebert Suconic > wrote: > >> Looks related to me: >> >> https://issues.apache.org/jira/browse/ARTEMIS-256 >> >> >> >> On Mon, Oct 19, 2015 at 4:04 AM, Mihkel N=C3=B5ges >> wrote: >> > Basic flow of getting unresponsive failback cluster: >> > Have machine with Ubuntu 14.04.3 >> > >> > 1. Install libaio1, Java 1.8.0_60, maven 3.3.3, download and extrac= t >> > apache-artemis-1.1.0-bin >> > < >> http://www.eu.apache.org/dist/activemq/activemq-artemis/1.1.0/apache-art= emis-1.1.0-bin.tar.gz >> > >> > in >> > /opt >> > 2. run $ mvn -Prelease install and $ mnv verify in >> > /opt/apache-artemis-1.1.0/examples/features/ha/replicated-failba= ck >> > SUCCESS >> > 3. Clean data folders and starts both servers manually: >> > $ >> > cd >> /opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/targe= t >> > $ rm -R server0/data/ >> > $ rm -R server1/data/ >> > $ server0/bin/artemis-service start >> > Starting artemis-service >> > artemis-service is now running (7154) >> > $ server1/bin/artemis-service start >> > Starting artemis-service >> > artemis-service is now running (7180) >> > 4. Kill master server and wait for slave to take over >> > $ kill -9 7154 >> > >> > $ tail -f server1/log/artemis.log >> > 08:52:54,798 INFO [org.apache.activemq.artemis.core.server] >> AMQ221043: >> > Protocol module found: [artemis-stomp-protocol]. Adding protocol >> support >> > for: STOMP >> > 08:53:02,145 INFO [org.apache.activemq.artemis.core.server] >> AMQ221109: >> > Apache ActiveMQ Artemis Backup Server version 1.1.0 [null] started, >> waiting >> > live to fail before it gets active >> > 08:53:03,582 INFO [org.apache.activemq.artemis.core.server] >> AMQ221024: >> > Backup server >> > ActiveMQServerImpl::serverUUID=3D64ddff0f-7636-11e5-bfa8-f5004e6195= f0 is >> > synchronized with live-server. >> > 08:53:03,777 INFO [org.apache.activemq.artemis.core.server] >> AMQ221031: >> > backup announced >> > 08:55:59,292 INFO [org.apache.activemq.artemis.core.server] >> AMQ221037: >> > ActiveMQServerImpl::serverUUID=3D64ddff0f-7636-11e5-bfa8-f5004e6195= f0 to >> > become 'live' >> > 08:55:59,302 WARN [org.apache.activemq.artemis.core.client] >> AMQ212004: >> > Failed to connect to server. >> > 08:55:59,778 INFO [org.apache.activemq.artemis.core.server] >> AMQ221003: >> > trying to deploy queue jms.queue.exampleQueue >> > 08:55:59,829 WARN [org.apache.activemq.artemis.core.client] >> AMQ212034: >> > There are more than one servers on the network broadcasting the sam= e >> node >> > id. You will see this message exactly once (per node) if a node is >> > restarted, in which case it can be safely ignored. But if it is log= ged >> > continuously it means you really do have more than one node on the >> same >> > network active concurrently with the same node id. This could occur >> if you >> > have a backup node active at the same time as its live node. >> > nodeID=3D64ddff0f-7636-11e5-bfa8-f5004e6195f0 >> > 08:55:59,836 INFO [org.apache.activemq.artemis.core.server] >> AMQ221007: >> > Server is now live >> > 08:55:59,869 INFO [org.apache.activemq.artemis.core.server] >> AMQ221020: >> > Started Acceptor at broker3:61617 for protocols >> > [CORE,MQTT,AMQP,HORNETQ,STOMP,OPENWIRE] >> > 5. >> > >> > Start master again and observer the logs: >> > $ server0/bin/artemis-service start >> > Starting artemis-service >> > artemis-service is now running (7388) >> > >> > $ tail -f server0/log/artemis.log >> > 08:57:24,625 INFO [org.apache.activemq.artemis.core.server] AMQ221012= : >> > Using AIO Journal >> > 08:57:24,694 INFO [org.apache.activemq.artemis.core.server] AMQ221043= : >> > Protocol module found: [artemis-server]. Adding protocol support for: >> CORE >> > 08:57:24,702 INFO [org.apache.activemq.artemis.core.server] AMQ221043= : >> > Protocol module found: [artemis-amqp-protocol]. Adding protocol suppor= t >> > for: AMQP >> > 08:57:24,731 INFO [org.apache.activemq.artemis.core.server] AMQ221043= : >> > Protocol module found: [artemis-hornetq-protocol]. Adding protocol >> support >> > for: HORNETQ >> > 08:57:24,733 INFO [org.apache.activemq.artemis.core.server] AMQ221043= : >> > Protocol module found: [artemis-mqtt-protocol]. Adding protocol suppor= t >> > for: MQTT >> > 08:57:24,743 INFO [org.apache.activemq.artemis.core.server] AMQ221043= : >> > Protocol module found: [artemis-openwire-protocol]. Adding protocol >> support >> > for: OPENWIRE >> > 08:57:24,878 INFO [org.apache.activemq.artemis.core.server] AMQ221043= : >> > Protocol module found: [artemis-stomp-protocol]. Adding protocol suppo= rt >> > for: STOMP >> > 08:57:25,082 INFO [org.apache.activemq.artemis.core.server] AMQ221109= : >> > Apache ActiveMQ Artemis Backup Server version 1.1.0 [null] started, >> waiting >> > live to fail before it gets active >> > 08:57:27,043 INFO [org.apache.activemq.artemis.core.server] AMQ221024= : >> > Backup server >> > ActiveMQServerImpl::serverUUID=3D64ddff0f-7636-11e5-bfa8-f5004e6195f0 = is >> > synchronized with live-server. >> > 08:57:27,948 INFO [org.apache.activemq.artemis.core.server] AMQ221031= : >> > backup announced >> > 08:57:31,227 WARN [org.apache.activemq.artemis.core.client] AMQ212037= : >> > Connection failure has been detected: AMQ119015: The connection was >> > disconnected because of server shutdown [code=3DDISCONNECTED] >> > 08:57:31,252 WARN [org.apache.activemq.artemis.core.client] AMQ212037= : >> > Connection failure has been detected: AMQ119015: The connection was >> > disconnected because of server shutdown [code=3DDISCONNECTED] >> > 08:57:31,307 WARN [org.apache.activemq.artemis.core.client] AMQ212037= : >> > Connection failure has been detected: AMQ119015: The connection was >> > disconnected because of server shutdown [code=3DDISCONNECTED] >> > 08:57:31,339 INFO [org.apache.activemq.artemis.core.server] AMQ221037= : >> > ActiveMQServerImpl::serverUUID=3D64ddff0f-7636-11e5-bfa8-f5004e6195f0 = to >> > become 'live' >> > 08:57:31,360 WARN [org.apache.activemq.artemis.core.client] AMQ212004= : >> > Failed to connect to server. >> > 08:57:31,413 ERROR [org.apache.activemq.artemis.core.server] AMQ224008= : >> > Failed to store id: java.lang.IllegalStateException: Cannot find add >> info 1 >> > at >> > >> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRe= cord(JournalImpl.java:799) >> > [artemis-journal-1.1.0.jar:1.1.0] >> > at >> > >> org.apache.activemq.artemis.core.journal.impl.JournalBase.appendDeleteRe= cord(JournalBase.java:183) >> > [artemis-journal-1.1.0.jar:1.1.0] >> > at >> > >> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRe= cord(JournalImpl.java:79) >> > [artemis-journal-1.1.0.jar:1.1.0] >> > at >> > >> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorage= Manager.deleteID(JournalStorageManager.java:1194) >> > [artemis-server-1.1.0.jar:1.1.0] >> > at >> > >> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGene= rator.deleteID(BatchingIDGenerator.java:152) >> > [artemis-server-1.1.0.jar:1.1.0] >> > at >> > >> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGene= rator.cleanup(BatchingIDGenerator.java:75) >> > [artemis-server-1.1.0.jar:1.1.0] >> > at >> > >> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorage= Manager.loadBindingJournal(JournalStorageManager.java:1784) >> > [artemis-server-1.1.0.jar:1.1.0] >> > at >> > >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJour= nals(ActiveMQServerImpl.java:1625) >> > [artemis-server-1.1.0.jar:1.1.0] >> > at >> > >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initiali= sePart2(ActiveMQServerImpl.java:1535) >> > [artemis-server-1.1.0.jar:1.1.0] >> > at >> > >> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivati= on.run(SharedNothingBackupActivation.java:249) >> > [artemis-server-1.1.0.jar:1.1.0] >> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >> > 08:57:31,540 WARN [org.apache.activemq.artemis.core.server] AMQ222173= : >> > Queue jms.queue.exampleQueue is duplicated during reload. This queue w= ill >> > be renamed as jms.queue.exampleQueue-0 >> > 08:57:31,550 ERROR [org.apache.activemq.artemis.core.server] AMQ224000= : >> > Failure in initialisation: java.lang.IllegalStateException: Cursor 2 h= ad >> > already been created >> > at >> > >> org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderIm= pl.createSubscription(PageCursorProviderImpl.java:97) >> > [artemis-server-1.1.0.jar:1.1.0] >> > at >> > >> org.apache.activemq.artemis.core.server.impl.PostOfficeJournalLoader.ini= tQueues(PostOfficeJournalLoader.java:140) >> > [artemis-server-1.1.0.jar:1.1.0] >> > at >> > >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJour= nals(ActiveMQServerImpl.java:1631) >> > [artemis-server-1.1.0.jar:1.1.0] >> > at >> > >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initiali= sePart2(ActiveMQServerImpl.java:1535) >> > [artemis-server-1.1.0.jar:1.1.0] >> > at >> > >> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivati= on.run(SharedNothingBackupActivation.java:249) >> > [artemis-server-1.1.0.jar:1.1.0] >> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >> > >> > >> > On 19 October 2015 at 10:31, Mihkel N=C3=B5ges > > >> > wrote: >> > >> >> Hi Clebert, >> >> >> >> I do not have other code to share with you but the example code in >> Artemis >> >> 1.1.0 binary deployment package. I'm running >> >> org.apache.activemq.artemis.jms.example.ReplicatedFailbackExample >> >> >> >> And only commented out the serverStart and killServer calls which I a= m >> >> doing manually. >> >> >> >> I do not think I do any of the steps too fast as I tail the server lo= g >> >> files in parallel and see everything is finished when I start the fai= l >> >> back. I have waited many minutes in between. >> >> >> >> Only changes in configuration to the test is changing localhost >> addresses >> >> with broker3 to make the cluster accessible remotely. >> >> >> >> BR! >> >> MIhkel >> >> >> >> On 18 October 2015 at 17:49, Clebert wrot= e: >> >> >> >>> Im not on my computer now but it sounds like you are doing a fail ba= ck >> >>> immediately after failed over. It takes some time (seconds) to the >> server >> >>> to activate on the backup. >> >>> >> >>> Later the server will need to copy the data back before it can be >> >>> activated in fail back mode. >> >>> >> >>> It sounds the live is not reaching backup for fail back. >> >>> >> >>> I will try looking it at it on Monday. Maybe you could post your >> example >> >>> at your GitHub fork. >> >>> >> >>> -- Clebert Suconic typing on the iPhone. >> >>> >> >>> > On Oct 18, 2015, at 08:15, Mihkel N=C3=B5ges < >> mihkel.noges@transferwise.com> >> >>> wrote: >> >>> > >> >>> > Hello again! >> >>> > >> >>> > I would be very grateful If someone could answer my questions. We >> need >> >>> the high availability to work to use the broker in production. >> >>> > >> >>> > When I run the replicated-failback example in one machine (broker3= ) >> it >> >>> succeeds. >> >>> > >> >>> > It fails when I run the same test - exactly the same servers with >> >>> slightly modified client remotely. >> >>> > >> >>> > I run client in debug mode from my IDE with commented out serverSt= art >> >>> and killServer calls. >> >>> > Deleted data folders and started the servers: >> >>> > artemis@broker3 >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/targ= et$ >> >>> rm -R server0/data/ >> >>> > >> >>> > artemis@broker3 >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/targ= et$ >> >>> rm -R server1/data/ >> >>> > >> >>> > artemis@broker3 >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/targ= et$ >> >>> server0/bin/artemis-service start >> >>> > >> >>> > Starting artemis-service >> >>> > >> >>> > artemis-service is now running (23357) >> >>> > >> >>> > artemis@broker3 >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/targ= et$ >> >>> server1/bin/artemis-service start >> >>> > >> >>> > Starting artemis-service >> >>> > >> >>> > artemis-service is now running (23383) >> >>> > >> >>> > Starting client and stopping on breakpoint at line 103: >> >>> > //ServerUtil.killServer(server0); >> >>> > // Step 11. Acknowledging the 2nd half of the sent messages will f= ail >> >>> as failover to the >> >>> > // backup server has occurred >> >>> > try { >> >>> > message0.acknowledge(); //line 103 >> >>> > killing server0 >> >>> > artemis@broker3 >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/targ= et$ >> >>> kill -9 23357 >> >>> > >> >>> > Proceeding to breakpoint at line 121: >> >>> > //server0 =3D ServerUtil.startServer(args[0], >> >>> ReplicatedFailbackExample.class.getSimpleName() + "0", 0, 10000); >> >>> > >> >>> > // Step 11. Acknowledging the 2nd half of the sent messages will f= ail >> >>> as failover to the >> >>> > // backup server has occurred >> >>> > try { >> >>> > message0.acknowledge(); // line 121 >> >>> > Starting server0: >> >>> > artemis@broker3 >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/targ= et$ >> >>> server0/bin/artemis-service start >> >>> > >> >>> > Starting artemis-service >> >>> > >> >>> > artemis-service is now running (24240) >> >>> > >> >>> > Server0 writes ERROR to it's log (see attached server0_artemis.log= ). >> >>> > Now when trying to proceed with the client it writes the following= in >> >>> the log and does not exit, but remains hanging forever: >> >>> > >> >>> > Oct 18, 2015 2:55:34 PM >> >>> >> org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionIm= pl >> >>> fail >> >>> > >> >>> > WARN: AMQ212037: Connection failure has been detected: AMQ119015: = The >> >>> connection was disconnected because of server shutdown >> [code=3DDISCONNECTED] >> >>> > >> >>> > Got message: This is text message 20 (redelivered?: false) >> >>> > >> >>> > Got exception while acknowledging message: AMQ119014: Timed out af= ter >> >>> waiting 30,000 ms for response when sending packet 43 >> >>> > >> >>> > Got message: This is text message 21 (redelivered?: false) >> >>> > >> >>> > Got message: This is text message 22 (redelivered?: false) >> >>> > >> >>> > Got message: This is text message 23 (redelivered?: false) >> >>> > >> >>> > Got message: This is text message 24 (redelivered?: false) >> >>> > >> >>> > Got message: This is text message 25 (redelivered?: false) >> >>> > >> >>> > Got message: This is text message 26 (redelivered?: false) >> >>> > >> >>> > Got message: This is text message 27 (redelivered?: false) >> >>> > >> >>> > Got message: This is text message 28 (redelivered?: false) >> >>> > >> >>> > Got message: This is text message 29 (redelivered?: false) >> >>> > >> >>> > As a result the slave (server1) remains stopped, not restarted as >> >>> expected and the master (server0) process appears to be running but >> does >> >>> not accept any connections. >> >>> > >> >>> > Exactly the same behavior is observable every time I try this. >> >>> > >> >>> > BR! >> >>> > Mihkel >> >>> > >> >>> >> On 13 October 2015 at 20:17, Mihkel N=C3=B5ges < >> >>> mihkel.noges@transferwise.com> wrote: >> >>> >> Hi Clebert, >> >>> >> >> >>> >> No test, just doing it on command line with standalone servers. I= 'm >> >>> using 1.1.0 installed with wget, not the snapshot. >> >>> >> >> >>> >> I'm wondering what should be the suggested procedure for admins t= o >> do >> >>> changes to HA cluster of 2 or 3 nodes of Artemis. If one of the node= s >> is >> >>> master by configuration, do they need to change it's config before >> >>> restarting it to become slave to have seamless change process and ma= ke >> some >> >>> instance master by configuration only if all the instances need to b= e >> >>> restarted? >> >>> >> >> >>> >> I tried also a cluster with 2 masters and 2 slaves with 2 separat= e >> >>> group-name values, but for some reason the second master I started >> became >> >>> slave for the first immediately. I expected it to become a clustered >> >>> load-balancing parallel master. Our loads are not yet that high to >> require >> >>> more than one master, so it's just a theoretical question. >> >>> >> >> >>> >> BR! >> >>> >> Mihkel >> >>> >> >> >>> >>> On 13 October 2015 at 20:03, Clebert Suconic < >> >>> clebert.suconic@gmail.com> wrote: >> >>> >>> The master needs to copy its data from the backup back to live >> before >> >>> >>> it's activated. >> >>> >>> >> >>> >>> Do you have a test replicating this? >> >>> >>> >> >>> >>> Did you try the snapshot build? >> >>> >>> >> >>> >>> On Tue, Oct 13, 2015 at 11:58 AM, Mihkel N=C3=B5ges >> >>> >>> wrote: >> >>> >>> > Hi, >> >>> >>> > >> >>> >>> > I configured replicating HA master-slave of Artemis 1.1.0 >> instances >> >>> on >> >>> >>> > Ubuntu 14.04.3. >> >>> >>> > >> >>> >>> > When I kill master the slave takes over as expected and starts >> >>> serving as >> >>> >>> > new master. When I then start the old master, it fails with th= e >> >>> following >> >>> >>> > errors in the log: >> >>> >>> > >> >>> >>> > 16:35:46,476 ERROR [org.apache.activemq.artemis.core.server] >> >>> AMQ224008: >> >>> >>> > Failed to store id: java.lang.IllegalStateException: Cannot fi= nd >> >>> add info 1 >> >>> >>> > at >> >>> >>> > >> >>> >> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRe= cord(JournalImpl.java:799) >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0] >> >>> >>> > at >> >>> >>> > >> >>> >> org.apache.activemq.artemis.core.journal.impl.JournalBase.appendDeleteRe= cord(JournalBase.java:183) >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0] >> >>> >>> > at >> >>> >>> > >> >>> >> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRe= cord(JournalImpl.java:79) >> >>> >>> > [artemis-journal-1.1.0.jar:1.1.0] >> >>> >>> > at >> >>> >>> > >> >>> >> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorage= Manager.deleteID(JournalStorageManager.java:1194) >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >>> >>> > at >> >>> >>> > >> >>> >> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGene= rator.deleteID(BatchingIDGenerator.java:152) >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >>> >>> > at >> >>> >>> > >> >>> >> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGene= rator.cleanup(BatchingIDGenerator.java:75) >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >>> >>> > at >> >>> >>> > >> >>> >> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorage= Manager.loadBindingJournal(JournalStorageManager.java: >> >>> 1784) >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >>> >>> > at >> >>> >>> > >> >>> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJour= nals(ActiveMQServerImpl.java: >> >>> 1625) >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >>> >>> > at >> >>> >>> > >> >>> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initiali= sePart2(ActiveMQServerImpl.java: >> >>> 1535) >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >>> >>> > at >> >>> >>> > >> >>> >> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivati= on.run(SharedNothingBackupActivation.java:249) >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >>> >>> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >> >>> >>> > >> >>> >>> > 16:35:46,572 WARN [org.apache.activemq.artemis.core.server] >> >>> AMQ222173: >> >>> >>> > Queue jms.queue.DLQ is duplicated during reload. This queue wi= ll >> be >> >>> renamed >> >>> >>> > as jms.queue.DLQ-0 >> >>> >>> > 16:35:46,572 ERROR [org.apache.activemq.artemis.core.server] >> >>> AMQ224000: >> >>> >>> > Failure in initialisation: java.lang.IllegalStateException: >> Cursor >> >>> 2 had >> >>> >>> > already been created >> >>> >>> > at >> >>> >>> > >> >>> >> org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderIm= pl.createSubscription(PageCursorProviderImpl.java:97) >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >>> >>> > at >> >>> >>> > >> >>> >> org.apache.activemq.artemis.core.server.impl.PostOfficeJournalLoader.ini= tQueues(PostOfficeJournalLoader.java:140) >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >>> >>> > at >> >>> >>> > >> >>> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJour= nals(ActiveMQServerImpl.java: >> >>> 1631) >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >>> >>> > at >> >>> >>> > >> >>> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initiali= sePart2(ActiveMQServerImpl.java: >> >>> 1535) >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >>> >>> > at >> >>> >>> > >> >>> >> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivati= on.run(SharedNothingBackupActivation.java:249) >> >>> >>> > [artemis-server-1.1.0.jar:1.1.0] >> >>> >>> > at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >> >>> >>> > >> >>> >>> > As a result both master and the slave remain unaccessible and = no >> >>> further >> >>> >>> > restarts solve the situation. >> >>> >>> > >> >>> >>> > Attached also master and slave broker.xml files. >> >>> >>> > >> >>> >>> > BR! >> >>> >>> > >> >>> >>> > Mihkel N=C3=B5ges >> >>> >>> >> >>> >>> >> >>> >>> >> >>> >>> -- >> >>> >>> Clebert Suconic >> >>> > >> >>> >> >> >> >> >> >> >> >> -- >> Clebert Suconic >> --=20 Clebert Suconic