lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Re: Solr-Distributed search
Date Fri, 06 Jun 2014 14:49:18 GMT
On 6/6/2014 8:31 AM, Aman Tandon wrote:
> In my organisation we also want to implement the solrcloud, but the problem
> is that, we are using the master-slave architecture and on master we do all
> indexing, architecture of master is lower than the slaves.
>
> So if we implement the solrcloud in a fashion that master will be the
> leader, and slaves will be the replicas then in that case, in the case of
> high load leader can bear it,  I guess every query firstly goes to leader
> then it distributes the request as i noticed from the logs and blogs :)
>
> As well as master is in NY and slaves are in Dallas, which also might cause
> latency issue and it will instead fail our purpose of faster query response.
>
> So i thought to use this shards parameter so that we query only from the
> replicas not to the leader so that leader just work fine. But we were not
> sure about this shards parameter, what do you think? what should we do with
> latency issue and shards parameter.

SolrCloud does not yet have any way to prefer one set of replicas over
the others, so if you just send it requests, they would be sent to both
Dallas and New York, affecting search latency.  Local replica preference
is a desperately needed feature.

Old-style distributed search with the shards parameter, combined with
master/slave replication, is an effective way to be absolutely sure
which servers you are querying.

I would actually recommend that you get rid of replication and have your
index updating software update each copy of the index independently. 
This is how I do my Solr install.  It opens up a whole new set of
possibilities -- you can change the schema and/or config on one set of
servers, or upgrade any component -- Solr, Java, etc., without affecting
the other set of servers at all.

One note: in order for the indexing paradigm I've outlined to be
actually effective, you must separately track which
inserts/updates/deletes have been done for each server set.  If you
don't do that, they can get out of sync when you restart a server. 
Also, if you don't do this, having a server is down for an extended
period of time might cause all indexing activity to stop on BOTH server
sets.

Thanks,
Shawn


Mime
View raw message