lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: SolrCloud multiple data center support
Date Mon, 03 Feb 2014 17:44:10 GMT
SolrCloud has not tackled multi data center yet.

I don’t think a or b are very good options yet.

Honestly, I think the best current bet is to use something like Apache Flume to send data
to both data centers - it will handle retries and keeping things in sync and splitting the
stream. Doesn’t satisfy all use cases though.

At some point, multi data center support will happen.

I can’t remember where ZooKeeper’s support for it is at, but with that and some logic
to favor nodes in your data center, that might be a viable route.

- Mark

http://about.me/markrmiller

On Feb 3, 2014, at 11:48 AM, Darrell Burgan <Darrell.Burgan@infor.com> wrote:

> Hello, we are using Solr in a SolrCloud configuration, with two Solr instances running
with three Zookeepers in a single data center. We presently have a single search index with
about 35 million entries in it, about 60GB disk space on each of the two Solr servers (120GB
total). I would expect our usage of Solr to grow to include other search indexes, and likely
larger data volumes.
>  
> I’m writing because we’re needing to grow beyond a single data center, with two (potentially
incompatible) goals:
>  
> 1.       We need to be able to have a hot disaster recovery site, in a completely separate
data center, that has a near-realtime replica of the search index.
> 
> 2.       We’d like to have the option to have multiple active/active data centers that
each see and update the same search index, distributed across data centers.
>  
> The options I’m aware of from reading archives:
>  
> a.       Simply set up the remote Solr instances as active parts of the same SolrCloud
cluster. This will  essentially involve us standing up multiple Zookeepers in the second data
center, and multiple Solr instances, and they will all keep each other in sync magically.
This will also solve both of our goals. However, I’m concerned about performance and whether
SolrCloud is smart enough to route local search queries only to local Solr servers … ? Also,
how does such a cluster tolerate and recover from network partitions?
> 
> b.      The remote Solr instances form their own completely unrelated SolrCloud cluster.
I have to invent some kind of replication logic of my own to sync data between them. This
replication would have to be bidirectional to satisfy both of our goals. I strongly dislike
this option since the application really should not concern itself with data distribution.
But I’ll do it if I must.
>  
> So my questions are:
>  
> -          Can anyone give me any guidance as to option a? Anyone using this in a real
production setting? Words of wisdom? Does it work?
> 
> -          Are there any other options that I’m not considering?
> 
> -          What is Solr’s answer to such configurations (we can’t be alone in needing
one)? Any big enhancements coming on the Solr road map to deal with this?
>  
> Thanks!
> Darrell Burgan
>  
>  
> 
> Darrell Burgan | Chief Architect, PeopleAnswers
> office: 214 445 2172 | mobile: 214 564 4450 | fax: 972 692 5386 | darrell.burgan@infor.com
| http://www.infor.com
> CONFIDENTIALITY NOTE: This email (including any attachments) is confidential and may
be protected by legal privilege. If you are not the intended recipient, be aware that any
disclosure, copying, distribution, or use of the information contained herein is prohibited.
 If you have received this message in error, please notify the sender by replying to this
message and then delete this message in its entirety. Thank you for your cooperation.
> 


Mime
View raw message