lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Evans <tevans...@googlemail.com>
Subject Re: Verifying - SOLR Cloud replaces load balancer?
Date Mon, 18 Apr 2016 15:26:47 GMT
On Mon, Apr 18, 2016 at 3:52 PM, John Bickerstaff
<john@johnbickerstaff.com> wrote:
> Thanks all - very helpful.
>
> @Shawn - your reply implies that even if I'm hitting the URL for a single
> endpoint via HTTP - the "balancing" will still occur across the Solr Cloud
> (I understand the caveat about that single endpoint being a potential point
> of failure).  I just want to verify that I'm interpreting your response
> correctly...
>
> (I have been asked to provide IT with a comprehensive list of options prior
> to a design discussion - which is why I'm trying to get clear about the
> various options)
>
> In a nutshell, I think I understand the following:
>
> a. Even if hitting a single URL, the Solr Cloud will "balance" across all
> available nodes for searching
>           Caveat: That single URL represents a potential single point of
> failure and this should be taken into account
>
> b. SolrJ's CloudSolrClient API provides the ability to distribute load --
> based on Zookeeper's "knowledge" of all available Solr instances.
>           Note: This is more robust than "a" due to the fact that it
> eliminates the "single point of failure"
>
> c.  Use of a load balancer hitting all known Solr instances will be fine -
> although the search requests may not run on the Solr instance the load
> balancer targeted - due to "a" above.
>
> Corrections or refinements welcomed...

With option a), although queries will be distributed across the
cluster, all queries will be going through that single node. Not only
is that a single point of failure, but you risk saturating the
inter-node network traffic, possibly resulting in lower QPS and higher
latency on your queries.

With option b), as well as SolrJ, recent versions of pysolr have a
ZK-aware SolrCloud client that behaves in a similar way.

With option c), you can use the preferLocalShards so that shards that
are local to the queried node are used in preference to distributed
shards. Depending on your shard/cluster topology, this can increase
performance if you are returning large amounts of data - many or large
fields or many documents.

Cheers

Tom

Mime
View raw message