lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Bickerstaff <j...@johnbickerstaff.com>
Subject Re: Verifying - SOLR Cloud replaces load balancer?
Date Mon, 18 Apr 2016 17:41:44 GMT
Nice - thanks Daniel.

On Mon, Apr 18, 2016 at 11:38 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.davis@nih.gov> wrote:

> One thing I like about SolrCloud is that I don't have to configure
> Master/Slave replication in each "core" the same way to get them to
> replicate.
>
> The other thing I like about SolrCloud, which is largely theoretical at
> this point, is that I don't need to test changes to a collection's
> configuration by bringing up a whole new solr on a whole new server -
> SolrCloud already virtualizes this, and so I can make up a random
> collection name that doesn't conflict, and create the thing, and smoke test
> with it.   I know that standard practice is to bring up all new nodes, but
> I don't see why this is needed.
>
> -----Original Message-----
> From: John Bickerstaff [mailto:john@johnbickerstaff.com]
> Sent: Monday, April 18, 2016 1:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Verifying - SOLR Cloud replaces load balancer?
>
> So - my IT guy makes the case that we don't really need Zookeeper / Solr
> Cloud...
>
> He may be right - we're serving static data (changes to the collection
> occur only 2 or 3 times a year and are minor)
>
> We probably could have 3 or 4 Solr nodes running in non-Cloud mode -- each
> configured the same way, behind a load balancer and do fine.
>
> I've got a Kafka server set up with the solr docs as topics.  It takes
> about 10 minutes to reload a "blank" Solr Server from the Kafka topic...
> If I target 3-4 SOLR servers from my microservice instead of one, it
> wouldn't take much longer than 10 minutes to concurrently reload all 3 or 4
> Solr servers from scratch...
>
> I'm biased in terms of using the most recent functionality, but I'm aware
> that bias is not necessarily based on facts and want to do my due
> diligence...
>
> Aside from the obvious benefits of spreading work across nodes (which may
> not be a big deal in our application and which my IT guy proposes is more
> transparently handled with a load balancer he understands) are there any
> other considerations that would drive a choice for Solr Cloud (zookeeper
> etc)?
>
>
>
> On Mon, Apr 18, 2016 at 9:26 AM, Tom Evans <tevans.uk@googlemail.com>
> wrote:
>
> > On Mon, Apr 18, 2016 at 3:52 PM, John Bickerstaff
> > <john@johnbickerstaff.com> wrote:
> > > Thanks all - very helpful.
> > >
> > > @Shawn - your reply implies that even if I'm hitting the URL for a
> > > single endpoint via HTTP - the "balancing" will still occur across
> > > the Solr
> > Cloud
> > > (I understand the caveat about that single endpoint being a
> > > potential
> > point
> > > of failure).  I just want to verify that I'm interpreting your
> > > response correctly...
> > >
> > > (I have been asked to provide IT with a comprehensive list of
> > > options
> > prior
> > > to a design discussion - which is why I'm trying to get clear about
> > > the various options)
> > >
> > > In a nutshell, I think I understand the following:
> > >
> > > a. Even if hitting a single URL, the Solr Cloud will "balance"
> > > across all available nodes for searching
> > >           Caveat: That single URL represents a potential single
> > > point of failure and this should be taken into account
> > >
> > > b. SolrJ's CloudSolrClient API provides the ability to distribute
> > > load -- based on Zookeeper's "knowledge" of all available Solr
> instances.
> > >           Note: This is more robust than "a" due to the fact that it
> > > eliminates the "single point of failure"
> > >
> > > c.  Use of a load balancer hitting all known Solr instances will be
> > > fine
> > -
> > > although the search requests may not run on the Solr instance the
> > > load balancer targeted - due to "a" above.
> > >
> > > Corrections or refinements welcomed...
> >
> > With option a), although queries will be distributed across the
> > cluster, all queries will be going through that single node. Not only
> > is that a single point of failure, but you risk saturating the
> > inter-node network traffic, possibly resulting in lower QPS and higher
> > latency on your queries.
> >
> > With option b), as well as SolrJ, recent versions of pysolr have a
> > ZK-aware SolrCloud client that behaves in a similar way.
> >
> > With option c), you can use the preferLocalShards so that shards that
> > are local to the queried node are used in preference to distributed
> > shards. Depending on your shard/cluster topology, this can increase
> > performance if you are returning large amounts of data - many or large
> > fields or many documents.
> >
> > Cheers
> >
> > Tom
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message