lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Luciani <jak...@gmail.com>
Subject Re: True master-master fail-over without data gaps (choosing CA in CAP)
Date Wed, 09 Mar 2011 23:04:13 GMT
Jason,

It's predecessor did, Lucandra. But Solandra is a new approach that manages shards of documents
across the cluster for you and uses solrs distributed search to query indexes. 

Jake

On Mar 9, 2011, at 5:15 PM, Jason Rutherglen <jason.rutherglen@gmail.com> wrote:

> Doesn't Solandra partition by term instead of document?
> 
> On Wed, Mar 9, 2011 at 2:13 PM, Smiley, David W. <dsmiley@mitre.org> wrote:
>> I was just about to jump in this conversation to mention Solandra and go fig, Solandra's
committer comes in. :-)   It was nice to meet you at Strata, Jake.
>> 
>> I haven't dug into the code yet but Solandra strikes me as a killer way to scale
Solr. I'm looking forward to playing with it; particularly looking at disk requirements and
performance measurements.
>> 
>> ~ David Smiley
>> 
>> On Mar 9, 2011, at 3:14 PM, Jake Luciani wrote:
>> 
>>> Hi Otis,
>>> 
>>> Have you considered using Solandra with Quorum writes
>>> to achieve master/master with CA semantics?
>>> 
>>> -Jake
>>> 
>>> 
>>> On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic <otis_gospodnetic@yahoo.com
>>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> ---- Original Message ----
>>>> 
>>>>> From: Robert Petersen <robertpe@buy.com>
>>>>> 
>>>>> Can't you skip the SAN and keep the indexes locally?  Then you  would
>>>>> have two redundant copies of the index and no lock issues.
>>>> 
>>>> I could, but then I'd have the issue of keeping them in sync, which seems
>>>> more
>>>> fragile.  I think SAN makes things simpler overall.
>>>> 
>>>>> Also, Can't master02 just be a slave to master01 (in the master farm
 and
>>>>> separate from the slave farm) until such time as master01 fails?   Then
>>>> 
>>>> No, because it wouldn't be in sync.  It would always be N minutes behind,
>>>> and
>>>> when the primary master fails, the secondary would not have all the docs
-
>>>> data
>>>> loss.
>>>> 
>>>>> master02 would start receiving the new documents with an  indexes
>>>>> complete up to the last replication at least and the other slaves  would
>>>>> be directed by LB to poll master02 also...
>>>> 
>>>> Yeah, "complete up to the last replication" is the problem.  It's a data
>>>> gap
>>>> that now needs to be filled somehow.
>>>> 
>>>> Otis
>>>> ----
>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>> Lucene ecosystem search :: http://search-lucene.com/
>>>> 
>>>> 
>>>>> -----Original  Message-----
>>>>> From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
>>>>> Sent: Wednesday, March 09, 2011 9:47 AM
>>>>> To: solr-user@lucene.apache.org
>>>>> Subject:  Re: True master-master fail-over without data gaps (choosing
CA
>>>>> in  CAP)
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> 
>>>>> ----- Original Message ----
>>>>>> From: Walter  Underwood <wunder@wunderwood.org>
>>>>> 
>>>>>> On  Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote:
>>>>>> 
>>>>>>> You mean  it's  not possible to have 2 masters that are in nearly
>>>>> real-time
>>>>>> sync?
>>>>>>> How  about with DRBD?  I know people use  DRBD to keep 2 Hadoop
NNs
>>>>> (their
>>>>>> edit
>>>>>> 
>>>>>>> logs) in  sync to avoid the current NN SPOF, for example, so
I'm
>>>>> thinking
>>>>>> this
>>>>>> 
>>>>>>> could be doable with Solr masters, too, no?
>>>>>> 
>>>>>> If you add fault-tolerant, you run into the CAP  Theorem.  Consistency,
>>>>> 
>>>>>> availability, partition: choose two. You cannot have  it  all.
>>>>> 
>>>>> Right, so I'll take Consistency and Availability, and I'll  put my 2
>>>>> masters in
>>>>> the same rack (which has redundant switches, power  supply, etc.) and
>>>>> thus
>>>>> minimize/avoid partitioning.
>>>>> Assuming the above  actually works, I think my Q remains:
>>>>> 
>>>>> How do you set up 2 Solr masters so  they are in near real-time sync?
>>>>> DRBD?
>>>>> 
>>>>> But here is maybe a simpler  scenario that more people may be
>>>>> considering:
>>>>> 
>>>>> Imagine 2 masters on 2  different servers in 1 rack, pointing to the
same
>>>>> index
>>>>> on the shared  storage (SAN) that also happens to live in the same rack.
>>>>> 2 Solr masters are  behind 1 LB VIP that indexer talks to.
>>>>> The VIP is configured so that all  requests always get routed to the
>>>>> primary
>>>>> master (because only 1 master  can be modifying an index at a time),
>>>>> except when
>>>>> this primary is down,  in which case the requests are sent to the
>>>>> secondary
>>>>> master.
>>>>> 
>>>>> So in  this case my Q is around automation of this, around Lucene index
>>>>> locks,
>>>>> around the need for manual intervention, and such.
>>>>> Concretely, if you  have these 2 master instances, the primary master
has
>>>>> the
>>>>> Lucene index  lock in the index dir.  When the secondary master needs
to
>>>>> take
>>>>> over  (i.e., when it starts receiving documents via LB), it needs to
be
>>>>> able to
>>>>> write to that same index.  But what if that lock is still around?   One
>>>>> could use
>>>>> the Native lock to make the lock disappear if the primary  master's JVM
>>>>> exited
>>>>> unexpectedly, and in that case everything *should*  work and be
>>>>> completely
>>>>> transparent, right?  That is, the secondary  will start getting new docs,
>>>>> it will
>>>>> use its IndexWriter to write to that  same shared index, which won't
be
>>>>> locked
>>>>> for writes because the lock is  gone, and everyone will be happy.  Did
I
>>>>> miss
>>>>> something important  here?
>>>>> 
>>>>> Assuming the above is correct, what if the lock is *not* gone  because
>>>>> the
>>>>> primary master's JVM is actually not dead, although maybe  unresponsive,
>>>>> so LB
>>>>> thinks the primary master is dead.  Then the LB  will route indexing
>>>>> requests to
>>>>> the secondary master, which will attempt  to write to the index, but
be
>>>>> denied
>>>>> because of the lock.  So a  human needs to jump in, remove the lock,
and
>>>>> manually
>>>>> reindex failed docs  if the upstream component doesn't buffer docs that
>>>>> failed to
>>>>> get indexed  and doesn't retry indexing them automatically.  Is this
>>>>> correct or
>>>>> is there a way to avoid humans  here?
>>>>> 
>>>>> Thanks,
>>>>> Otis
>>>>> ----
>>>>> Sematext :: http://sematext.com/ :: Solr -  Lucene - Nutch
>>>>> Lucene ecosystem search :: http://search-lucene.com/
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> http://twitter.com/tjake
>> 
>> 

Mime
View raw message