Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of bstewart.ny@gmail.com
 designates 209.85.220.176 as permitted sender)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Apple Message framework v1084)
Subject: Re: Replication with an HA master
From: Robert Stewart <bstewart.ny@gmail.com>
In-Reply-To: 
 <2D9C008C5453F149B4F91A7B12E6F0E2823EEE4AE3@MEWMAD0PC02G04.accounts.wistate.us>
Date: Thu, 13 Oct 2011 16:01:38 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <31A87CED-7F44-4A51-93E3-1DD2795F3F82@gmail.com>
References: 
 <7995FAAFACA16A418525460DF43C9A2F08E1BEF0BB@VA3DIAXVS861.RED001.local>
 <5E5DB8C8-A704-414E-9E2E-E973D80E83A6@gmail.com>
 <1318357686.77741.YahooMailNeo@web130107.mail.mud.yahoo.com>
 <7995FAAFACA16A418525460DF43C9A2F08E1BEF9F5@VA3DIAXVS861.RED001.local>
 <1318360614.72994.YahooMailNeo@web130124.mail.mud.yahoo.com>
 <E7274EB1-F7CF-4323-8053-1FB5A5B37888@gmail.com>
 <1318364241.41949.YahooMailNeo@web130112.mail.mud.yahoo.com>
 <2D9C008C5453F149B4F91A7B12E6F0E2823EEE4AE3@MEWMAD0PC02G04.accounts.wistate.us>
To: solr-user@lucene.apache.org

Yes that is a good point.  Thanks.

I think I will avoid using NAS/SAN and use two masters, one setup as a =
repeater (slave and master).  In case of very rare master failure, some =
minor manual intervention will be required to re-configure remaining =
master or bring other one back up.

My only concern in that case is losing new documents from solrj client =
since there is no broker/buffer/queue between solrj client and SOLR =
master.  It would be nice if there was some open source broker/queue =
which could sit between solrj and SOLR and queue up messages =
(publish/subscribe).

Bob

On Oct 13, 2011, at 3:56 PM, Jaeger, Jay - DOT wrote:

> One thing to consider is the case where the JVM is up, but the system =
is otherwise unavailable (say, a NIC failure, firewall failure, load =
balancer failure) - especially if you use a SAN (whose connection is =
different from the normal network).
>=20
> In such a case the old master might have uncommitted updates.
>=20
> JRJ
>=20
> -----Original Message-----
> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]=20
> Sent: Tuesday, October 11, 2011 3:17 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Replication with an HA master
>=20
> Hello,
> ----- Original Message -----
>=20
>> From: Robert Stewart <bstewart.ny@gmail.com>
>> To: solr-user@lucene.apache.org
>> Cc:=20
>> Sent: Tuesday, October 11, 2011 3:37 PM
>> Subject: Re: Replication with an HA master
>>=20
>> In the case of using a shared (SAN) index between 2 masters, what =
happens if the=20
>> live master fails in such a way that the index remains "locked" (such=20=

>> as if some hardware failure and it did not unlock/close index).  Will =
the other=20
>> master be able to open/write to the index as new documents are added?
>=20
>=20
> You'd use native locks, which should disappear if the JVM dies.  If it =
does not, then I'm not 100% sure what happens, but in the worst case =
there would be a need for a quick manual (or scripted) intervention.  =
But your index would be up to date!
>=20
>> Also, if that can work ok, would it work if you have a LB (VIP) from =
both=20
>> indexing and replication sides of the 2 masters, such that some VIP =
used by=20
>> solrj for indexing new documents via HTTP, and the same VIP used by =
slave=20
>> searchers for replication?  That sounds like it would work.
>=20
>=20
> Precisely what you should do.  e.g. "master-vip" is the "hostname" =
that both SolrJ would post new docs to and the master "server" slaves =
would poll for index changes.
>=20
> Otis
> ----
>=20
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
>=20
>=20
>=20
>=20
>> On Oct 11, 2011, at 3:16 PM, Otis Gospodnetic wrote:
>>=20
>>> Hello,
>>>=20
>>> Yes, you've read about NFS, which is why I gave the example of a SAN=20=

>> (which can have multiple power supplies, controllers, etc.)
>>>=20
>>> Yes, should be OK to have multiple Solr instances have the same =
index open,=20
>> since only one of them will actually be writing to it, thanks to LB.
>>>=20
>>> Otis
>>> ----
>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>> Lucene ecosystem search :: http://search-lucene.com/
>>>=20
>>>=20
>>>> ________________________________
>>>> From: Brandon Ramirez <Brandon_Ramirez@elementk.com>
>>>> To: "solr-user@lucene.apache.org"=20
>> <solr-user@lucene.apache.org>
>>>> Sent: Tuesday, October 11, 2011 2:55 PM
>>>> Subject: RE: Replication with an HA master
>>>>=20
>>>> Using a shared volume crossed my mind too, but I discarded the idea=20=

>> because of literature I have read about Lucene performing poorly =
against remote=20
>> file systems.  But then I suppose a SAN wouldn't be a remote file =
system in=20
>> the same sense as an NFS-mounted NAS or similar.
>>>>=20
>>>> Should I be concerned about two solr instances on two machines =
having=20
>> the same SAN-based index open, as long as only one of them is =
receiving=20
>> requests?  I would think in theory it would work, but I don't have =
any=20
>> production-level experience with Solr yet, only textbook knowledge.
>>>>=20
>>>>=20
>>>> Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848=20
>>>> Software Engineer II | Element K | www.elementk.com
>>>>=20
>>>>=20
>>>> -----Original Message-----
>>>> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]=20
>>>> Sent: Tuesday, October 11, 2011 2:28 PM
>>>> To: solr-user@lucene.apache.org
>>>> Subject: Re: Replication with an HA master
>>>>=20
>>>> A few alternatives:
>>>> * Have the master keep the index on a shared disk (e.g. SAN)
>>>> * Use LB to easily switch to between masters, potentially even=20
>> automatically if LB can detect the primary is down
>>>>=20
>>>> Otis
>>>> ----
>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene=20
>> ecosystem search :: http://search-lucene.com/
>>>>=20
>>>>=20
>>>>> ________________________________
>>>>> From: Robert Stewart <bstewart.ny@gmail.com>
>>>>> To: solr-user@lucene.apache.org
>>>>> Sent: Friday, October 7, 2011 10:22 AM
>>>>> Subject: Re: Replication with an HA master
>>>>>=20
>>>>> Your idea sounds like the correct path.  Setup 2 masters, one=20
>> running=20
>>>>> in "slave" mode which pulls replicas from the live=20
>> master.  When/if live master goes down, you just reconfigure and =
restart the=20
>> backup master to be the live master.  You'd also need to then start =
data=20
>> import on the backup master (enable/start cron job?), and redirect =
slave=20
>> searchers to pull replicas from the new live master.  All that could =
be done=20
>> using scripts or something like puppet possibly.
>>>>>=20
>>>>> Another solution maybe is to run 2 "live" masters, which=20
>> both index the same content from the same data source.  If one goes =
down, then=20
>> you just need to redirect slave searchers to the backup master for =
replication.
>>>>>=20
>>>>> I am also starting a similar project which needs some disaster=20
>> recovery processes in place, so any other info would be useful to me =
as well.
>>>>>=20
>>>>> Bob
>>>>>=20
>>>>> On Oct 7, 2011, at 9:53 AM, Brandon Ramirez wrote:
>>>>>=20
>>>>>> We are getting ready to start a project using Solr as our=20
>> backend search engine and I am trying to devise a deployment =
architecture that=20
>> works for us.  We definitely need a master/slave replication =
strategy,=20
>> that's for sure, but my concern is the master becomes a single point =
of=20
>> failure.
>>>>>>=20
>>>>>> Fortunately, real-time search is not a requirement for us.  If=20
>> search results are a few minutes out of sync with our database, it's =
not a=20
>> big deal.
>>>>>>=20
>>>>>> So what I would like to do is have a set of query servers=20
>> (slaves) that are only used for querying, no indexing and have them =
use=20
>> Solr's HTTP replication mechanism on a 2 or 3 minute interval.  To =
get HA=20
>> indexing, I'd like to have 2 masters: a primary and a standby.  All =
indexing=20
>> requests go to the primary unless it's taken out of service.  To keep =
the=20
>> standby ready to takeover if it needs to, it needs to be more up to =
date than=20
>> the slaves.  I'd like to have it replicate every 30 seconds or so.
>>>>>>=20
>>>>>> The reason I'm asking about it on this list is that I=20
>> haven't seen any Solr documentation or even anything that talks about =
this. =20
>> I can't be the only one concerned about having a single point of =
failure, so=20
>> I'm reaching out to see what others have done in this case before I =
go with=20
>> my own solution.
>>>>>>=20
>>>>>>=20
>>>>>> Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848=20
>> Software=20
>>>>>> Engineer II | Element K |=20
>> www.elementk.com<http://www.elementk.com/>
>>>>>>=20
>>>>>=20
>>>>>=20
>>>>>=20
>>>>>=20
>>>>=20
>>>>=20
>>>>=20
>>>>=20
>>=20