Return-Path: X-Original-To: apmail-activemq-users-archive@www.apache.org Delivered-To: apmail-activemq-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AB70718830 for ; Thu, 22 Oct 2015 21:23:51 +0000 (UTC) Received: (qmail 4415 invoked by uid 500); 22 Oct 2015 21:23:51 -0000 Delivered-To: apmail-activemq-users-archive@activemq.apache.org Received: (qmail 4374 invoked by uid 500); 22 Oct 2015 21:23:51 -0000 Mailing-List: contact users-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@activemq.apache.org Delivered-To: mailing list users@activemq.apache.org Received: (qmail 4357 invoked by uid 99); 22 Oct 2015 21:23:51 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 22 Oct 2015 21:23:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 97BD61A24DC for ; Thu, 22 Oct 2015 21:23:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.1 X-Spam-Level: X-Spam-Status: No, score=-0.1 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id bjKu7HpcswyU for ; Thu, 22 Oct 2015 21:23:35 +0000 (UTC) Received: from mail-vk0-f46.google.com (mail-vk0-f46.google.com [209.85.213.46]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 33B2B25465 for ; Thu, 22 Oct 2015 21:23:34 +0000 (UTC) Received: by vkgy127 with SMTP id y127so54431976vkg.0 for ; Thu, 22 Oct 2015 14:23:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=6tzy3TS/gXlhivalgxuX1pgWDuQiRVvzF3uhcxOoWeY=; b=AtDTiRW7ZQ6GwZHApPSAQYF+4mivDlzbIS4VqMBkRzJghVLibBkA9ZTAmDH15Ky1sg NLRwwtEnnn+/nMT8aeDRFP6h8t9Sx/f2Tm82MLO2mBHSqr3sKuv2dbWZsr8yeA6/D2+i 8idNTygi9HmXrecfkdKzuMyQ4mBiTug6dO1UP4MLDn132FwlH38DrvGP8eFBp/b3FTlG tt/XeqFQstfQh3YC30xXnhkJDcO7zKfw6uXGtbYoBuIHyGIizYRcxxYGOjPWPYrl9550 WF9fv5/eK+rcaUxBr762q85zx+MMZPYqi2jIq0EUOGahpulOtiGsZB7FTsZbsjDe0pwo Bw4w== MIME-Version: 1.0 X-Received: by 10.31.10.16 with SMTP id 16mr10748662vkk.135.1445549013141; Thu, 22 Oct 2015 14:23:33 -0700 (PDT) Received: by 10.31.77.196 with HTTP; Thu, 22 Oct 2015 14:23:33 -0700 (PDT) In-Reply-To: References: <4AFE4A9D-A486-4EC2-9EAD-7F71D62F1A47@gmail.com> <562664DD.6070208@redhat.com> <5627BAB4.30606@redhat.com> Date: Thu, 22 Oct 2015 17:23:33 -0400 Message-ID: Subject: Re: [Artemis] Master fails to start up after failback From: Clebert Suconic To: users@activemq.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Thu, Oct 22, 2015 at 4:22 AM, Mihkel N=C3=B5ges wrote: > Hi Martyn, Clebert, > > Thanks for fixing it and for good suggestions! I think the best way still > is to not use failback in production at all and instead change the master > configuration to be slave after the master crashes or is stopped and need= s > to be restarted. I guess this is the more safer way of doing failover > anyway? The simpler is always better. If you had a good monitoring infra-structure in place, I would actually prefer to restart the same server back. And notice.. this goes for any messaging solution you choose. We provide the bits for the best user choices... > > I think it would help a lot to have suggested deployment layouts and > maintenance procedures chapter in Artemis documentation to avoid > inexperienced users like me trying to use the broker in unorthodox ways > after reading the existing documentation. Well, you can't never replace experience and people doing consulting services. the more we document the more questions people will ask... As you are getting to know the server better we value your opinions... and the best way to contribute would be with patches on docs, bugs... anything. We always welcome pull requests though... if you see any improvements, please send it in... :) Also, keep in mind there are a guys here that could help you with any issues. Commercial support is always an alternative and there are a few folks listed here: http://activemq.apache.org/support.html > > BR! > Mihkel > > > On 21 October 2015 at 19:38, Clebert Suconic > wrote: > >> another possible workaround is to start the server from where your >> paths are relative from. >> >> Or you could try the snapshot build: >> >> >> https://repository.apache.org/content/repositories/snapshots/org/apache/= activemq/apache-artemis/1.1.1-SNAPSHOT/apache-artemis-1.1.1-20151021.162952= -18-bin.zip >> >> On Wed, Oct 21, 2015 at 12:17 PM, Martyn Taylor >> wrote: >> > This should now be fixed upstream as part of: >> > https://issues.apache.org/jira/browse/ARTEMIS-273 >> > >> > >> > On 20/10/15 19:29, Mihkel N=C3=B5ges wrote: >> >> >> >> Thanks Martyn! >> >> >> >> I will try this tomorrow. >> >> >> >> BR! >> >> Mihkel >> >> >> >> On 20 October 2015 at 18:59, Martyn Taylor wrote= : >> >> >> >>> Hi Mihkel, >> >>> >> >>> I tried reproducing this locally and ran into an issue straight away >> when >> >>> running the example. After some investigation it appears that there= is >> >>> an >> >>> issue in the ArtemisServerImpl which is preventing the journal files >> from >> >>> being replicated properly when using relative paths in the >> configuration >> >>> for data directories. >> >>> >> >>> I am working on a fix for this at the moment. In the meantime, you >> could >> >>> try using absolute paths in your server configuration for the follow >> >>> elements: >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> >> >>> Regards >> >>> Martyn >> >>> >> >>> On 19/10/15 09:04, Mihkel N=C3=B5ges wrote: >> >>> >> >>>> Basic flow of getting unresponsive failback cluster: >> >>>> Have machine with Ubuntu 14.04.3 >> >>>> >> >>>> 1. Install libaio1, Java 1.8.0_60, maven 3.3.3, download and >> >>>> extract >> >>>> apache-artemis-1.1.0-bin >> >>>> < >> >>>> >> >>>> >> http://www.eu.apache.org/dist/activemq/activemq-artemis/1.1.0/apache-art= emis-1.1.0-bin.tar.gz >> >>>> in >> >>>> /opt >> >>>> 2. run $ mvn -Prelease install and $ mnv verify in >> >>>> >> >>>> /opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback >> >>>> SUCCESS >> >>>> 3. Clean data folders and starts both servers manually: >> >>>> $ >> >>>> cd >> >>>> >> >>>> >> /opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/targe= t >> >>>> $ rm -R server0/data/ >> >>>> $ rm -R server1/data/ >> >>>> $ server0/bin/artemis-service start >> >>>> Starting artemis-service >> >>>> artemis-service is now running (7154) >> >>>> $ server1/bin/artemis-service start >> >>>> Starting artemis-service >> >>>> artemis-service is now running (7180) >> >>>> 4. Kill master server and wait for slave to take over >> >>>> >> >>>> $ kill -9 7154 >> >>>> >> >>>> $ tail -f server1/log/artemis.log >> >>>> 08:52:54,798 INFO [org.apache.activemq.artemis.core.server] >> >>>> AMQ221043: >> >>>> Protocol module found: [artemis-stomp-protocol]. Adding protoc= ol >> >>>> support >> >>>> for: STOMP >> >>>> 08:53:02,145 INFO [org.apache.activemq.artemis.core.server] >> >>>> AMQ221109: >> >>>> Apache ActiveMQ Artemis Backup Server version 1.1.0 [null] >> started, >> >>>> waiting >> >>>> live to fail before it gets active >> >>>> 08:53:03,582 INFO [org.apache.activemq.artemis.core.server] >> >>>> AMQ221024: >> >>>> Backup server >> >>>> >> ActiveMQServerImpl::serverUUID=3D64ddff0f-7636-11e5-bfa8-f5004e6195f0 >> >>>> is >> >>>> synchronized with live-server. >> >>>> 08:53:03,777 INFO [org.apache.activemq.artemis.core.server] >> >>>> AMQ221031: >> >>>> backup announced >> >>>> 08:55:59,292 INFO [org.apache.activemq.artemis.core.server] >> >>>> AMQ221037: >> >>>> >> ActiveMQServerImpl::serverUUID=3D64ddff0f-7636-11e5-bfa8-f5004e6195f0 >> >>>> to >> >>>> become 'live' >> >>>> 08:55:59,302 WARN [org.apache.activemq.artemis.core.client] >> >>>> AMQ212004: >> >>>> Failed to connect to server. >> >>>> 08:55:59,778 INFO [org.apache.activemq.artemis.core.server] >> >>>> AMQ221003: >> >>>> trying to deploy queue jms.queue.exampleQueue >> >>>> 08:55:59,829 WARN [org.apache.activemq.artemis.core.client] >> >>>> AMQ212034: >> >>>> There are more than one servers on the network broadcasting th= e >> >>>> same >> >>>> node >> >>>> id. You will see this message exactly once (per node) if a nod= e >> is >> >>>> restarted, in which case it can be safely ignored. But if it i= s >> >>>> logged >> >>>> continuously it means you really do have more than one node on >> the >> >>>> same >> >>>> network active concurrently with the same node id. This could >> occur >> >>>> if you >> >>>> have a backup node active at the same time as its live node. >> >>>> nodeID=3D64ddff0f-7636-11e5-bfa8-f5004e6195f0 >> >>>> 08:55:59,836 INFO [org.apache.activemq.artemis.core.server] >> >>>> AMQ221007: >> >>>> Server is now live >> >>>> 08:55:59,869 INFO [org.apache.activemq.artemis.core.server] >> >>>> AMQ221020: >> >>>> Started Acceptor at broker3:61617 for protocols >> >>>> [CORE,MQTT,AMQP,HORNETQ,STOMP,OPENWIRE] >> >>>> 5. >> >>>> >> >>>> >> >>>> Start master again and observer the logs: >> >>>> $ server0/bin/artemis-service start >> >>>> Starting artemis-service >> >>>> artemis-service is now running (7388) >> >>>> >> >>>> $ tail -f server0/log/artemis.log >> >>>> 08:57:24,625 INFO [org.apache.activemq.artemis.core.server] >> AMQ221012: >> >>>> Using AIO Journal >> >>>> 08:57:24,694 INFO [org.apache.activemq.artemis.core.server] >> AMQ221043: >> >>>> Protocol module found: [artemis-server]. Adding protocol support fo= r: >> >>>> CORE >> >>>> 08:57:24,702 INFO [org.apache.activemq.artemis.core.server] >> AMQ221043: >> >>>> Protocol module found: [artemis-amqp-protocol]. Adding protocol >> support >> >>>> for: AMQP >> >>>> 08:57:24,731 INFO [org.apache.activemq.artemis.core.server] >> AMQ221043: >> >>>> Protocol module found: [artemis-hornetq-protocol]. Adding protocol >> >>>> support >> >>>> for: HORNETQ >> >>>> 08:57:24,733 INFO [org.apache.activemq.artemis.core.server] >> AMQ221043: >> >>>> Protocol module found: [artemis-mqtt-protocol]. Adding protocol >> support >> >>>> for: MQTT >> >>>> 08:57:24,743 INFO [org.apache.activemq.artemis.core.server] >> AMQ221043: >> >>>> Protocol module found: [artemis-openwire-protocol]. Adding protocol >> >>>> support >> >>>> for: OPENWIRE >> >>>> 08:57:24,878 INFO [org.apache.activemq.artemis.core.server] >> AMQ221043: >> >>>> Protocol module found: [artemis-stomp-protocol]. Adding protocol >> support >> >>>> for: STOMP >> >>>> 08:57:25,082 INFO [org.apache.activemq.artemis.core.server] >> AMQ221109: >> >>>> Apache ActiveMQ Artemis Backup Server version 1.1.0 [null] started, >> >>>> waiting >> >>>> live to fail before it gets active >> >>>> 08:57:27,043 INFO [org.apache.activemq.artemis.core.server] >> AMQ221024: >> >>>> Backup server >> >>>> ActiveMQServerImpl::serverUUID=3D64ddff0f-7636-11e5-bfa8-f5004e6195= f0 is >> >>>> synchronized with live-server. >> >>>> 08:57:27,948 INFO [org.apache.activemq.artemis.core.server] >> AMQ221031: >> >>>> backup announced >> >>>> 08:57:31,227 WARN [org.apache.activemq.artemis.core.client] >> AMQ212037: >> >>>> Connection failure has been detected: AMQ119015: The connection was >> >>>> disconnected because of server shutdown [code=3DDISCONNECTED] >> >>>> 08:57:31,252 WARN [org.apache.activemq.artemis.core.client] >> AMQ212037: >> >>>> Connection failure has been detected: AMQ119015: The connection was >> >>>> disconnected because of server shutdown [code=3DDISCONNECTED] >> >>>> 08:57:31,307 WARN [org.apache.activemq.artemis.core.client] >> AMQ212037: >> >>>> Connection failure has been detected: AMQ119015: The connection was >> >>>> disconnected because of server shutdown [code=3DDISCONNECTED] >> >>>> 08:57:31,339 INFO [org.apache.activemq.artemis.core.server] >> AMQ221037: >> >>>> ActiveMQServerImpl::serverUUID=3D64ddff0f-7636-11e5-bfa8-f5004e6195= f0 to >> >>>> become 'live' >> >>>> 08:57:31,360 WARN [org.apache.activemq.artemis.core.client] >> AMQ212004: >> >>>> Failed to connect to server. >> >>>> 08:57:31,413 ERROR [org.apache.activemq.artemis.core.server] >> AMQ224008: >> >>>> Failed to store id: java.lang.IllegalStateException: Cannot find ad= d >> >>>> info >> >>>> 1 >> >>>> at >> >>>> >> >>>> >> >>>> >> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRe= cord(JournalImpl.java:799) >> >>>> [artemis-journal-1.1.0.jar:1.1.0] >> >>>> at >> >>>> >> >>>> >> >>>> >> org.apache.activemq.artemis.core.journal.impl.JournalBase.appendDeleteRe= cord(JournalBase.java:183) >> >>>> [artemis-journal-1.1.0.jar:1.1.0] >> >>>> at >> >>>> >> >>>> >> >>>> >> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRe= cord(JournalImpl.java:79) >> >>>> [artemis-journal-1.1.0.jar:1.1.0] >> >>>> at >> >>>> >> >>>> >> >>>> >> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorage= Manager.deleteID(JournalStorageManager.java:1194) >> >>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>> at >> >>>> >> >>>> >> >>>> >> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGene= rator.deleteID(BatchingIDGenerator.java:152) >> >>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>> at >> >>>> >> >>>> >> >>>> >> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGene= rator.cleanup(BatchingIDGenerator.java:75) >> >>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>> at >> >>>> >> >>>> >> >>>> >> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorage= Manager.loadBindingJournal(JournalStorageManager.java: >> >>>> 1784) >> >>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>> at >> >>>> >> >>>> >> >>>> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJour= nals(ActiveMQServerImpl.java: >> >>>> 1625) >> >>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>> at >> >>>> >> >>>> >> >>>> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initiali= sePart2(ActiveMQServerImpl.java: >> >>>> 1535) >> >>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>> at >> >>>> >> >>>> >> >>>> >> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivati= on.run(SharedNothingBackupActivation.java:249) >> >>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>> at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >> >>>> 08:57:31,540 WARN [org.apache.activemq.artemis.core.server] >> AMQ222173: >> >>>> Queue jms.queue.exampleQueue is duplicated during reload. This queu= e >> >>>> will >> >>>> be renamed as jms.queue.exampleQueue-0 >> >>>> 08:57:31,550 ERROR [org.apache.activemq.artemis.core.server] >> AMQ224000: >> >>>> Failure in initialisation: java.lang.IllegalStateException: Cursor = 2 >> had >> >>>> already been created >> >>>> at >> >>>> >> >>>> >> >>>> >> org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderIm= pl.createSubscription(PageCursorProviderImpl.java:97) >> >>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>> at >> >>>> >> >>>> >> >>>> >> org.apache.activemq.artemis.core.server.impl.PostOfficeJournalLoader.ini= tQueues(PostOfficeJournalLoader.java:140) >> >>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>> at >> >>>> >> >>>> >> >>>> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJour= nals(ActiveMQServerImpl.java: >> >>>> 1631) >> >>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>> at >> >>>> >> >>>> >> >>>> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initiali= sePart2(ActiveMQServerImpl.java: >> >>>> 1535) >> >>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>> at >> >>>> >> >>>> >> >>>> >> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivati= on.run(SharedNothingBackupActivation.java:249) >> >>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>> at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >> >>>> >> >>>> >> >>>> On 19 October 2015 at 10:31, Mihkel N=C3=B5ges >> >>>> >> >>>> wrote: >> >>>> >> >>>> Hi Clebert, >> >>>>> >> >>>>> I do not have other code to share with you but the example code in >> >>>>> Artemis >> >>>>> 1.1.0 binary deployment package. I'm running >> >>>>> org.apache.activemq.artemis.jms.example.ReplicatedFailbackExample >> >>>>> >> >>>>> And only commented out the serverStart and killServer calls which = I >> am >> >>>>> doing manually. >> >>>>> >> >>>>> I do not think I do any of the steps too fast as I tail the server >> log >> >>>>> files in parallel and see everything is finished when I start the >> fail >> >>>>> back. I have waited many minutes in between. >> >>>>> >> >>>>> Only changes in configuration to the test is changing localhost >> >>>>> addresses >> >>>>> with broker3 to make the cluster accessible remotely. >> >>>>> >> >>>>> BR! >> >>>>> MIhkel >> >>>>> >> >>>>> On 18 October 2015 at 17:49, Clebert >> wrote: >> >>>>> >> >>>>> Im not on my computer now but it sounds like you are doing a fail >> back >> >>>>>> >> >>>>>> immediately after failed over. It takes some time (seconds) to th= e >> >>>>>> server >> >>>>>> to activate on the backup. >> >>>>>> >> >>>>>> Later the server will need to copy the data back before it can be >> >>>>>> activated in fail back mode. >> >>>>>> >> >>>>>> It sounds the live is not reaching backup for fail back. >> >>>>>> >> >>>>>> I will try looking it at it on Monday. Maybe you could post your >> >>>>>> example >> >>>>>> at your GitHub fork. >> >>>>>> >> >>>>>> -- Clebert Suconic typing on the iPhone. >> >>>>>> >> >>>>>> On Oct 18, 2015, at 08:15, Mihkel N=C3=B5ges >> >>>>>> >> >>>>>> wrote: >> >>>>>> >> >>>>>>> Hello again! >> >>>>>>> >> >>>>>>> I would be very grateful If someone could answer my questions. W= e >> >>>>>>> need >> >>>>>>> >> >>>>>> the high availability to work to use the broker in production. >> >>>>>> >> >>>>>>> When I run the replicated-failback example in one machine (broke= r3) >> >>>>>>> it >> >>>>>>> >> >>>>>> succeeds. >> >>>>>> >> >>>>>>> It fails when I run the same test - exactly the same servers wit= h >> >>>>>>> >> >>>>>> slightly modified client remotely. >> >>>>>> >> >>>>>>> I run client in debug mode from my IDE with commented out >> serverStart >> >>>>>>> >> >>>>>> and killServer calls. >> >>>>>> >> >>>>>>> Deleted data folders and started the servers: >> >>>>>>> artemis@broker3 >> >>>>>>> >> >>>>>>> >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/targ= et$ >> >>>>>>> >> >>>>>> rm -R server0/data/ >> >>>>>> >> >>>>>>> artemis@broker3 >> >>>>>>> >> >>>>>>> >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/targ= et$ >> >>>>>>> >> >>>>>> rm -R server1/data/ >> >>>>>> >> >>>>>>> artemis@broker3 >> >>>>>>> >> >>>>>>> >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/targ= et$ >> >>>>>>> >> >>>>>> server0/bin/artemis-service start >> >>>>>> >> >>>>>>> Starting artemis-service >> >>>>>>> >> >>>>>>> artemis-service is now running (23357) >> >>>>>>> >> >>>>>>> artemis@broker3 >> >>>>>>> >> >>>>>>> >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/targ= et$ >> >>>>>>> >> >>>>>> server1/bin/artemis-service start >> >>>>>> >> >>>>>>> Starting artemis-service >> >>>>>>> >> >>>>>>> artemis-service is now running (23383) >> >>>>>>> >> >>>>>>> Starting client and stopping on breakpoint at line 103: >> >>>>>>> //ServerUtil.killServer(server0); >> >>>>>>> // Step 11. Acknowledging the 2nd half of the sent messages will >> fail >> >>>>>>> >> >>>>>> as failover to the >> >>>>>> >> >>>>>>> // backup server has occurred >> >>>>>>> try { >> >>>>>>> message0.acknowledge(); //line 103 >> >>>>>>> killing server0 >> >>>>>>> artemis@broker3 >> >>>>>>> >> >>>>>>> >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/targ= et$ >> >>>>>>> >> >>>>>> kill -9 23357 >> >>>>>> >> >>>>>>> Proceeding to breakpoint at line 121: >> >>>>>>> //server0 =3D ServerUtil.startServer(args[0], >> >>>>>>> >> >>>>>> ReplicatedFailbackExample.class.getSimpleName() + "0", 0, 10000); >> >>>>>> >> >>>>>>> // Step 11. Acknowledging the 2nd half of the sent messages will >> fail >> >>>>>>> >> >>>>>> as failover to the >> >>>>>> >> >>>>>>> // backup server has occurred >> >>>>>>> try { >> >>>>>>> message0.acknowledge(); // line 121 >> >>>>>>> Starting server0: >> >>>>>>> artemis@broker3 >> >>>>>>> >> >>>>>>> >> :/opt/apache-artemis-1.1.0/examples/features/ha/replicated-failback/targ= et$ >> >>>>>>> >> >>>>>> server0/bin/artemis-service start >> >>>>>> >> >>>>>>> Starting artemis-service >> >>>>>>> >> >>>>>>> artemis-service is now running (24240) >> >>>>>>> >> >>>>>>> Server0 writes ERROR to it's log (see attached >> server0_artemis.log). >> >>>>>>> Now when trying to proceed with the client it writes the followi= ng >> in >> >>>>>>> >> >>>>>> the log and does not exit, but remains hanging forever: >> >>>>>> >> >>>>>>> Oct 18, 2015 2:55:34 PM >> >>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionIm= pl >> >>>>>> fail >> >>>>>> >> >>>>>>> WARN: AMQ212037: Connection failure has been detected: AMQ119015= : >> The >> >>>>>>> >> >>>>>> connection was disconnected because of server shutdown >> >>>>>> [code=3DDISCONNECTED] >> >>>>>> >> >>>>>>> Got message: This is text message 20 (redelivered?: false) >> >>>>>>> >> >>>>>>> Got exception while acknowledging message: AMQ119014: Timed out >> after >> >>>>>>> >> >>>>>> waiting 30,000 ms for response when sending packet 43 >> >>>>>> >> >>>>>>> Got message: This is text message 21 (redelivered?: false) >> >>>>>>> >> >>>>>>> Got message: This is text message 22 (redelivered?: false) >> >>>>>>> >> >>>>>>> Got message: This is text message 23 (redelivered?: false) >> >>>>>>> >> >>>>>>> Got message: This is text message 24 (redelivered?: false) >> >>>>>>> >> >>>>>>> Got message: This is text message 25 (redelivered?: false) >> >>>>>>> >> >>>>>>> Got message: This is text message 26 (redelivered?: false) >> >>>>>>> >> >>>>>>> Got message: This is text message 27 (redelivered?: false) >> >>>>>>> >> >>>>>>> Got message: This is text message 28 (redelivered?: false) >> >>>>>>> >> >>>>>>> Got message: This is text message 29 (redelivered?: false) >> >>>>>>> >> >>>>>>> As a result the slave (server1) remains stopped, not restarted a= s >> >>>>>>> >> >>>>>> expected and the master (server0) process appears to be running b= ut >> >>>>>> does >> >>>>>> not accept any connections. >> >>>>>> >> >>>>>>> Exactly the same behavior is observable every time I try this. >> >>>>>>> >> >>>>>>> BR! >> >>>>>>> Mihkel >> >>>>>>> >> >>>>>>> On 13 October 2015 at 20:17, Mihkel N=C3=B5ges < >> >>>>>>> mihkel.noges@transferwise.com> wrote: >> >>>>>>> Hi Clebert, >> >>>>>>>> >> >>>>>>>> No test, just doing it on command line with standalone servers. >> I'm >> >>>>>>>> >> >>>>>>> using 1.1.0 installed with wget, not the snapshot. >> >>>>>>> I'm wondering what should be the suggested procedure for admins = to >> do >> >>>>>>> changes to HA cluster of 2 or 3 nodes of Artemis. If one of the >> nodes >> >>>>>> >> >>>>>> is >> >>>>>> master by configuration, do they need to change it's config befor= e >> >>>>>> restarting it to become slave to have seamless change process and >> make >> >>>>>> some >> >>>>>> instance master by configuration only if all the instances need t= o >> be >> >>>>>> restarted? >> >>>>>> >> >>>>>>> I tried also a cluster with 2 masters and 2 slaves with 2 separa= te >> >>>>>>> group-name values, but for some reason the second master I start= ed >> >>>>>> >> >>>>>> became >> >>>>>> slave for the first immediately. I expected it to become a cluste= red >> >>>>>> load-balancing parallel master. Our loads are not yet that high t= o >> >>>>>> require >> >>>>>> more than one master, so it's just a theoretical question. >> >>>>>> >> >>>>>>> BR! >> >>>>>>>> >> >>>>>>>> Mihkel >> >>>>>>>> >> >>>>>>>> On 13 October 2015 at 20:03, Clebert Suconic < >> >>>>>>>> clebert.suconic@gmail.com> wrote: >> >>>>>>> >> >>>>>>> The master needs to copy its data from the backup back to live >> before >> >>>>>>>>> >> >>>>>>>>> it's activated. >> >>>>>>>>> >> >>>>>>>>> Do you have a test replicating this? >> >>>>>>>>> >> >>>>>>>>> Did you try the snapshot build? >> >>>>>>>>> >> >>>>>>>>> On Tue, Oct 13, 2015 at 11:58 AM, Mihkel N=C3=B5ges >> >>>>>>>>> wrote: >> >>>>>>>>> >> >>>>>>>>>> Hi, >> >>>>>>>>>> >> >>>>>>>>>> I configured replicating HA master-slave of Artemis 1.1.0 >> >>>>>>>>>> instances >> >>>>>>>>>> >> >>>>>>>>> on >> >>>>>>> >> >>>>>>> Ubuntu 14.04.3. >> >>>>>>>>>> >> >>>>>>>>>> When I kill master the slave takes over as expected and start= s >> >>>>>>>>>> >> >>>>>>>>> serving as >> >>>>>>> >> >>>>>>> new master. When I then start the old master, it fails with the >> >>>>>>>>> >> >>>>>>>>> following >> >>>>>>> >> >>>>>>> errors in the log: >> >>>>>>>>>> >> >>>>>>>>>> 16:35:46,476 ERROR [org.apache.activemq.artemis.core.server] >> >>>>>>>>>> >> >>>>>>>>> AMQ224008: >> >>>>>>> >> >>>>>>> Failed to store id: java.lang.IllegalStateException: Cannot find >> >>>>>>>>> >> >>>>>>>>> add info 1 >> >>>>>>> >> >>>>>>> at >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRe= cord(JournalImpl.java:799) >> >>>>>> >> >>>>>>> [artemis-journal-1.1.0.jar:1.1.0] >> >>>>>>>>>> >> >>>>>>>>>> at >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.journal.impl.JournalBase.appendDeleteRe= cord(JournalBase.java:183) >> >>>>>> >> >>>>>>> [artemis-journal-1.1.0.jar:1.1.0] >> >>>>>>>>>> >> >>>>>>>>>> at >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRe= cord(JournalImpl.java:79) >> >>>>>> >> >>>>>>> [artemis-journal-1.1.0.jar:1.1.0] >> >>>>>>>>>> >> >>>>>>>>>> at >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorage= Manager.deleteID(JournalStorageManager.java:1194) >> >>>>>> >> >>>>>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>>>>>>>> >> >>>>>>>>>> at >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGene= rator.deleteID(BatchingIDGenerator.java:152) >> >>>>>> >> >>>>>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>>>>>>>> >> >>>>>>>>>> at >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.persistence.impl.journal.BatchingIDGene= rator.cleanup(BatchingIDGenerator.java:75) >> >>>>>> >> >>>>>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>>>>>>>> >> >>>>>>>>>> at >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorage= Manager.loadBindingJournal(JournalStorageManager.java: >> >>>>>> 1784) >> >>>>>> >> >>>>>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>>>>>>>> >> >>>>>>>>>> at >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJour= nals(ActiveMQServerImpl.java: >> >>>>>> 1625) >> >>>>>> >> >>>>>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>>>>>>>> >> >>>>>>>>>> at >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initiali= sePart2(ActiveMQServerImpl.java: >> >>>>>> 1535) >> >>>>>> >> >>>>>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>>>>>>>> >> >>>>>>>>>> at >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivati= on.run(SharedNothingBackupActivation.java:249) >> >>>>>> >> >>>>>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>>>>>>>> >> >>>>>>>>>> at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >> >>>>>>>>>> >> >>>>>>>>>> 16:35:46,572 WARN [org.apache.activemq.artemis.core.server] >> >>>>>>>>>> >> >>>>>>>>> AMQ222173: >> >>>>>>> >> >>>>>>> Queue jms.queue.DLQ is duplicated during reload. This queue will= be >> >>>>>>>>> >> >>>>>>>>> renamed >> >>>>>>> >> >>>>>>> as jms.queue.DLQ-0 >> >>>>>>>>>> >> >>>>>>>>>> 16:35:46,572 ERROR [org.apache.activemq.artemis.core.server] >> >>>>>>>>>> >> >>>>>>>>> AMQ224000: >> >>>>>>> >> >>>>>>> Failure in initialisation: java.lang.IllegalStateException: Curs= or >> >>>>>>>>> >> >>>>>>>>> 2 had >> >>>>>>> >> >>>>>>> already been created >> >>>>>>>>>> >> >>>>>>>>>> at >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderIm= pl.createSubscription(PageCursorProviderImpl.java:97) >> >>>>>> >> >>>>>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>>>>>>>> >> >>>>>>>>>> at >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.server.impl.PostOfficeJournalLoader.ini= tQueues(PostOfficeJournalLoader.java:140) >> >>>>>> >> >>>>>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>>>>>>>> >> >>>>>>>>>> at >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.loadJour= nals(ActiveMQServerImpl.java: >> >>>>>> 1631) >> >>>>>> >> >>>>>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>>>>>>>> >> >>>>>>>>>> at >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.initiali= sePart2(ActiveMQServerImpl.java: >> >>>>>> 1535) >> >>>>>> >> >>>>>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>>>>>>>> >> >>>>>>>>>> at >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>> >> >>>>>> >> org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivati= on.run(SharedNothingBackupActivation.java:249) >> >>>>>> >> >>>>>>> [artemis-server-1.1.0.jar:1.1.0] >> >>>>>>>>>> >> >>>>>>>>>> at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_60] >> >>>>>>>>>> >> >>>>>>>>>> As a result both master and the slave remain unaccessible and= no >> >>>>>>>>>> >> >>>>>>>>> further >> >>>>>>> >> >>>>>>> restarts solve the situation. >> >>>>>>>>>> >> >>>>>>>>>> Attached also master and slave broker.xml files. >> >>>>>>>>>> >> >>>>>>>>>> BR! >> >>>>>>>>>> >> >>>>>>>>>> Mihkel N=C3=B5ges >> >>>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> -- >> >>>>>>>>> Clebert Suconic >> >>>>>>>>> >> > >> >> >> >> -- >> Clebert Suconic >> --=20 Clebert Suconic