Return-Path: X-Original-To: apmail-qpid-users-archive@www.apache.org Delivered-To: apmail-qpid-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6AF3210A79 for ; Thu, 6 Jun 2013 20:01:48 +0000 (UTC) Received: (qmail 24904 invoked by uid 500); 6 Jun 2013 20:01:48 -0000 Delivered-To: apmail-qpid-users-archive@qpid.apache.org Received: (qmail 24871 invoked by uid 500); 6 Jun 2013 20:01:47 -0000 Mailing-List: contact users-help@qpid.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@qpid.apache.org Delivered-To: mailing list users@qpid.apache.org Received: (qmail 24863 invoked by uid 99); 6 Jun 2013 20:01:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jun 2013 20:01:47 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of robbie.gemmell@gmail.com designates 209.85.214.52 as permitted sender) Received: from [209.85.214.52] (HELO mail-bk0-f52.google.com) (209.85.214.52) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Jun 2013 20:01:40 +0000 Received: by mail-bk0-f52.google.com with SMTP id d7so1432576bkh.25 for ; Thu, 06 Jun 2013 13:01:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=T/KAh3t9cfdvTXVouy0CD1WmlbGYB6kyvEi6oMIhn0Q=; b=Iw6gv/cSYKZfy7KSUVGyfUrBO6+WhujEjpTNDbCWHqJTZx1wxXtXGjrrJCiHhKYB0k 4byHMYEJs5fl/KBMOQEVTXCqviByrQM9sBhCSvAoYrPPkp/bL5nwxTeBiCU6s5sztvQj eXf6ryXgALp5IJEEjJYIYbjsa6mqK2GW/eI3xa8xt4SfO2a0RvpB/wJYNGGkL+6vBL+R RDyjNWBMlmnGsAt79QH6PG72Ym5A11akiFRA2Pb5ICYm0W5gFgXNOWIWDAacJZmk8L+b voUReOUV0ofIYAHgsjZozOww5pSuH7OWe9grrzHbBN0VZqi5Er/yx3JCW+xWndpS3f+9 hiPg== MIME-Version: 1.0 X-Received: by 10.204.183.16 with SMTP id ce16mr11273665bkb.91.1370548880077; Thu, 06 Jun 2013 13:01:20 -0700 (PDT) Received: by 10.205.122.76 with HTTP; Thu, 6 Jun 2013 13:01:19 -0700 (PDT) In-Reply-To: References: Date: Thu, 6 Jun 2013 21:01:19 +0100 Message-ID: Subject: Re: Questions on Java Broker BDB HA failover (was: [jira] [Commented] (QPID-4910) Python, Ruby, and C++ clients automatically connect to replica server when master fails) From: Robbie Gemmell To: "users@qpid.apache.org" , jamesbelch@verizon.net Content-Type: multipart/alternative; boundary=20cf301ee34125264304de81c7dc X-Virus-Checked: Checked by ClamAV on apache.org --20cf301ee34125264304de81c7dc Content-Type: text/plain; charset=ISO-8859-1 Hi James, It is probably worth you clearing out any bdbha store files you have managed to create in your previous efforts and starting fresh. The bdb files themselves contain all the group information for the nodes once created, it is possible you have got any existing files on disk into a bit of a state with the steps taken thus far. Here is some example config that I used to start up a two node cluster locally on the same machine, using port 5002 for the second node simply because I obviously couldnt have them use the same port. First node (master at this point): localhost org.apache.qpid.server.store.berkeleydb.BDBHAMessageStore /path/to/bdbhastore/node1 ReplicationGroup node1 localhost:5001 localhost:5001 NO_SYNC\,NO_SYNC\,SIMPLE_MAJORITY true true Second node (replica at this point): localhost org.apache.qpid.server.store.berkeleydb.BDBHAMessageStore /path/to/bdbhastore/node2 ReplicationGroup node2 localhost:5002 localhost:5001 NO_SYNC\,NO_SYNC\,SIMPLE_MAJORITY true false All you should have to do to bring up the nodes on different machines is change the hostname from localhost to whatever your nodes actual hostnames are (making sure the each of the names are resolvable from the other host) and change the second node to use port 5001 if you want them both on 5001 on their respective machine, i.e.: node1hostname:5001 node1hostname:5001 and node2hostname:5001 node1hostname:5001 (In hindsight I shouldnt have called the virtualhost 'localhost', it would possibly be clearer if i had picked something else. The vhost name has no relationship to the bdbha node names or hostnames). Robbie ==================================================== On 6 June 2013 18:16, James Belch wrote: I made the following changes: master: host1:5001 host1:5001 replica: host2:5001 host1:5001 I still get the same error. Any ideas? Thanks, James On 6 June 2013 18:03, Robbie Gemmell wrote: > Hi James, > > There may be other issues, but the most obvious one is that the helper > node configuration looks like it needs updated: > > > "host1 > host1:5001 > host1:5002" > > The first node created/started should have its helper details set to its > own node details, i.e :5001, allowing it to create the group > and become master. > > "host2 > host2:5001 > host1:5002" > > When starting the subsequent node to become the replica it should also > have its helper details set to the address of the first node, i.e > :5001 again, allowing it to join the existing group. > > Robbie > > On 6 June 2013 17:38, Rob Godfrey wrote: > >> Resending from my gmail rather than apache account, as my apache account >> doesn't seem to be able to post to users :-) >> >> On 6 June 2013 18:29, Robert Godfrey wrote: >> >> > Forwarding to the Qpid Users mail group - which is probably a better bet >> > to get answers >> > >> > ---------- Forwarded message ---------- >> > From: James Belch (JIRA) >> > Date: 6 June 2013 17:35 >> > Subject: [jira] [Commented] (QPID-4910) Python, Ruby, and C++ clients >> > automatically connect to replica server when master fails >> > To: rgodfrey@apache.org >> > >> > >> > >> > [ >> > >> https://issues.apache.org/jira/browse/QPID-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677143#comment-13677143 >> ] >> > >> > James Belch commented on QPID-4910: >> > ----------------------------------- >> > >> > Thanks for the quick response. I think we will to configure the clients >> > with a list of brokers for the non Java clients. Could you guys answer >> > another question for me regarding failover. I am using Berkeley DB to >> > implement our High Availability solution. I have the master configured >> as >> > follows: >> > >> > localhost >> > >> > >> > >> > org.apache.qpid.server.store.berkeleydb.BDBHAMessageStore >> > ${work}/bdbhastore/host1 >> > >> > ReplicationGroup >> > host1 >> > host1:5001 >> > host1:5002 >> > NO_SYNC\,NO_SYNC\,SIMPLE_MAJORITY >> > true >> > true >> > >> > >> > ... >> > >> > >> > I have the replica configured as follows: >> > localhost >> > >> > >> > >> > org.apache.qpid.server.store.berkeleydb.BDBHAMessageStore >> > ${work}/bdbhastore/host2 >> > >> > ReplicationGroup >> > host2 >> > host2:5001 >> > host1:5002 >> > NO_SYNC\,NO_SYNC\,SIMPLE_MAJORITY >> > true >> > false >> > >> > >> > ... >> > >> > >> > >> > When I start the replica server, I get the following error: "New node >> > host2(-1) unknown to rep group". >> > If I do a netstat, I see the connections attempting to be made, but the >> > sockets go to TIME_WAIT state and timeout after a minute. Any ideas? >> > >> > > Python, Ruby, and C++ clients automatically connect to replica server >> > when master fails >> > > >> > >> --------------------------------------------------------------------------------------- >> > > >> > > Key: QPID-4910 >> > > URL: https://issues.apache.org/jira/browse/QPID-4910 >> > > Project: Qpid >> > > Issue Type: Improvement >> > > Components: Java Broker >> > > Affects Versions: 0.20 >> > > Environment: C++, Ruby, Python, and Java clients connecting >> to a >> > Java Broker running on Redhat 6.3 >> > > Reporter: James Belch >> > > Fix For: 0.23 >> > > >> > > >> > > I am currently in the process of designing a high availability >> solution >> > for our software. We are using the Java broker, and we have Java, Ruby, >> > C++, and Python clients. I was reading your High Availability document >> at >> > >> http://qpid.apache.org/books/0.18/AMQP-Messaging-Broker-Java-Book/html/High-Availability.htmlandsaw a footnote at the bottom stating "[1] The automatic failover >> > feature is available only for AMQP connections from the Java client. >> > Management connections (JMX) do not current offer this feature." Is >> this >> > still the case or was this fixed in .20? If this is still the case, >> will >> > it be fixed in a future release? >> > >> > -- >> > This message is automatically generated by JIRA. >> > If you think it was sent incorrectly, please contact your JIRA >> > administrators >> > For more information on JIRA, see: >> http://www.atlassian.com/software/jira >> > >> > >> > > --20cf301ee34125264304de81c7dc--