Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D26B89655 for ; Thu, 13 Oct 2011 20:02:10 +0000 (UTC) Received: (qmail 34769 invoked by uid 500); 13 Oct 2011 20:02:06 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 34709 invoked by uid 500); 13 Oct 2011 20:02:06 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 34701 invoked by uid 99); 13 Oct 2011 20:02:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Oct 2011 20:02:06 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bstewart.ny@gmail.com designates 209.85.220.176 as permitted sender) Received: from [209.85.220.176] (HELO mail-vx0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Oct 2011 20:02:01 +0000 Received: by vcbfl13 with SMTP id fl13so662174vcb.35 for ; Thu, 13 Oct 2011 13:01:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=TkfH0t2JrrzkC1tsN4NcKLwyLYI8Y5rsjCOwTcpl/zY=; b=LEAxbf9ERhLwpLJHiAJTtkznN0CsMJp+YvgJFnTjcGJwbEIDCV4ejpHjdcMwLMWsLu V0sgQVSRdI88cWYVQGYU3ktQJK52JNEw4tZuFHaxjmGm2uAbLhJ8AgDqgB7ztBmim38O BbPCMIQefWfOnsoPYAA89Tn+ReBA11/CV9ofs= Received: by 10.52.184.103 with SMTP id et7mr5599585vdc.35.1318536100022; Thu, 13 Oct 2011 13:01:40 -0700 (PDT) Received: from new-host-5.home (pool-74-101-224-228.nycmny.fios.verizon.net. [74.101.224.228]) by mx.google.com with ESMTPS id hl5sm5216840vdb.18.2011.10.13.13.01.38 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 13 Oct 2011 13:01:38 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Apple Message framework v1084) Subject: Re: Replication with an HA master From: Robert Stewart In-Reply-To: <2D9C008C5453F149B4F91A7B12E6F0E2823EEE4AE3@MEWMAD0PC02G04.accounts.wistate.us> Date: Thu, 13 Oct 2011 16:01:38 -0400 Content-Transfer-Encoding: quoted-printable Message-Id: <31A87CED-7F44-4A51-93E3-1DD2795F3F82@gmail.com> References: <7995FAAFACA16A418525460DF43C9A2F08E1BEF0BB@VA3DIAXVS861.RED001.local> <5E5DB8C8-A704-414E-9E2E-E973D80E83A6@gmail.com> <1318357686.77741.YahooMailNeo@web130107.mail.mud.yahoo.com> <7995FAAFACA16A418525460DF43C9A2F08E1BEF9F5@VA3DIAXVS861.RED001.local> <1318360614.72994.YahooMailNeo@web130124.mail.mud.yahoo.com> <1318364241.41949.YahooMailNeo@web130112.mail.mud.yahoo.com> <2D9C008C5453F149B4F91A7B12E6F0E2823EEE4AE3@MEWMAD0PC02G04.accounts.wistate.us> To: solr-user@lucene.apache.org X-Mailer: Apple Mail (2.1084) Yes that is a good point. Thanks. I think I will avoid using NAS/SAN and use two masters, one setup as a = repeater (slave and master). In case of very rare master failure, some = minor manual intervention will be required to re-configure remaining = master or bring other one back up. My only concern in that case is losing new documents from solrj client = since there is no broker/buffer/queue between solrj client and SOLR = master. It would be nice if there was some open source broker/queue = which could sit between solrj and SOLR and queue up messages = (publish/subscribe). Bob On Oct 13, 2011, at 3:56 PM, Jaeger, Jay - DOT wrote: > One thing to consider is the case where the JVM is up, but the system = is otherwise unavailable (say, a NIC failure, firewall failure, load = balancer failure) - especially if you use a SAN (whose connection is = different from the normal network). >=20 > In such a case the old master might have uncommitted updates. >=20 > JRJ >=20 > -----Original Message----- > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]=20 > Sent: Tuesday, October 11, 2011 3:17 PM > To: solr-user@lucene.apache.org > Subject: Re: Replication with an HA master >=20 > Hello, > ----- Original Message ----- >=20 >> From: Robert Stewart >> To: solr-user@lucene.apache.org >> Cc:=20 >> Sent: Tuesday, October 11, 2011 3:37 PM >> Subject: Re: Replication with an HA master >>=20 >> In the case of using a shared (SAN) index between 2 masters, what = happens if the=20 >> live master fails in such a way that the index remains "locked" (such=20= >> as if some hardware failure and it did not unlock/close index). Will = the other=20 >> master be able to open/write to the index as new documents are added? >=20 >=20 > You'd use native locks, which should disappear if the JVM dies. If it = does not, then I'm not 100% sure what happens, but in the worst case = there would be a need for a quick manual (or scripted) intervention. = But your index would be up to date! >=20 >> Also, if that can work ok, would it work if you have a LB (VIP) from = both=20 >> indexing and replication sides of the 2 masters, such that some VIP = used by=20 >> solrj for indexing new documents via HTTP, and the same VIP used by = slave=20 >> searchers for replication? That sounds like it would work. >=20 >=20 > Precisely what you should do. e.g. "master-vip" is the "hostname" = that both SolrJ would post new docs to and the master "server" slaves = would poll for index changes. >=20 > Otis > ---- >=20 > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ >=20 >=20 >=20 >=20 >> On Oct 11, 2011, at 3:16 PM, Otis Gospodnetic wrote: >>=20 >>> Hello, >>>=20 >>> Yes, you've read about NFS, which is why I gave the example of a SAN=20= >> (which can have multiple power supplies, controllers, etc.) >>>=20 >>> Yes, should be OK to have multiple Solr instances have the same = index open,=20 >> since only one of them will actually be writing to it, thanks to LB. >>>=20 >>> Otis >>> ---- >>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch >>> Lucene ecosystem search :: http://search-lucene.com/ >>>=20 >>>=20 >>>> ________________________________ >>>> From: Brandon Ramirez >>>> To: "solr-user@lucene.apache.org"=20 >> >>>> Sent: Tuesday, October 11, 2011 2:55 PM >>>> Subject: RE: Replication with an HA master >>>>=20 >>>> Using a shared volume crossed my mind too, but I discarded the idea=20= >> because of literature I have read about Lucene performing poorly = against remote=20 >> file systems. But then I suppose a SAN wouldn't be a remote file = system in=20 >> the same sense as an NFS-mounted NAS or similar. >>>>=20 >>>> Should I be concerned about two solr instances on two machines = having=20 >> the same SAN-based index open, as long as only one of them is = receiving=20 >> requests? I would think in theory it would work, but I don't have = any=20 >> production-level experience with Solr yet, only textbook knowledge. >>>>=20 >>>>=20 >>>> Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848=20 >>>> Software Engineer II | Element K | www.elementk.com >>>>=20 >>>>=20 >>>> -----Original Message----- >>>> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]=20 >>>> Sent: Tuesday, October 11, 2011 2:28 PM >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: Replication with an HA master >>>>=20 >>>> A few alternatives: >>>> * Have the master keep the index on a shared disk (e.g. SAN) >>>> * Use LB to easily switch to between masters, potentially even=20 >> automatically if LB can detect the primary is down >>>>=20 >>>> Otis >>>> ---- >>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene=20 >> ecosystem search :: http://search-lucene.com/ >>>>=20 >>>>=20 >>>>> ________________________________ >>>>> From: Robert Stewart >>>>> To: solr-user@lucene.apache.org >>>>> Sent: Friday, October 7, 2011 10:22 AM >>>>> Subject: Re: Replication with an HA master >>>>>=20 >>>>> Your idea sounds like the correct path. Setup 2 masters, one=20 >> running=20 >>>>> in "slave" mode which pulls replicas from the live=20 >> master. When/if live master goes down, you just reconfigure and = restart the=20 >> backup master to be the live master. You'd also need to then start = data=20 >> import on the backup master (enable/start cron job?), and redirect = slave=20 >> searchers to pull replicas from the new live master. All that could = be done=20 >> using scripts or something like puppet possibly. >>>>>=20 >>>>> Another solution maybe is to run 2 "live" masters, which=20 >> both index the same content from the same data source. If one goes = down, then=20 >> you just need to redirect slave searchers to the backup master for = replication. >>>>>=20 >>>>> I am also starting a similar project which needs some disaster=20 >> recovery processes in place, so any other info would be useful to me = as well. >>>>>=20 >>>>> Bob >>>>>=20 >>>>> On Oct 7, 2011, at 9:53 AM, Brandon Ramirez wrote: >>>>>=20 >>>>>> We are getting ready to start a project using Solr as our=20 >> backend search engine and I am trying to devise a deployment = architecture that=20 >> works for us. We definitely need a master/slave replication = strategy,=20 >> that's for sure, but my concern is the master becomes a single point = of=20 >> failure. >>>>>>=20 >>>>>> Fortunately, real-time search is not a requirement for us. If=20 >> search results are a few minutes out of sync with our database, it's = not a=20 >> big deal. >>>>>>=20 >>>>>> So what I would like to do is have a set of query servers=20 >> (slaves) that are only used for querying, no indexing and have them = use=20 >> Solr's HTTP replication mechanism on a 2 or 3 minute interval. To = get HA=20 >> indexing, I'd like to have 2 masters: a primary and a standby. All = indexing=20 >> requests go to the primary unless it's taken out of service. To keep = the=20 >> standby ready to takeover if it needs to, it needs to be more up to = date than=20 >> the slaves. I'd like to have it replicate every 30 seconds or so. >>>>>>=20 >>>>>> The reason I'm asking about it on this list is that I=20 >> haven't seen any Solr documentation or even anything that talks about = this. =20 >> I can't be the only one concerned about having a single point of = failure, so=20 >> I'm reaching out to see what others have done in this case before I = go with=20 >> my own solution. >>>>>>=20 >>>>>>=20 >>>>>> Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848=20 >> Software=20 >>>>>> Engineer II | Element K |=20 >> www.elementk.com >>>>>>=20 >>>>>=20 >>>>>=20 >>>>>=20 >>>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>=20