lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Load Balancing between Two Cloud Clusters
Date Tue, 01 May 2018 15:33:59 GMT
Glad to help. Yeah, I thought you might have been making it harder
than it needed to be ;).

In SolrCloud you're constantly running up against "it's just magic
until it's not", knowing when magic applies and when it doesn't can be
tricky, very tricky.....

Basically when using LBs, people just throw nodes at the LB when they
come up. If the Solr end points aren't available, then they're skipped
etc.....

I'll also add that SolrJ, the CloudSolrClient specifically, does all
this on the client side, it's ZK-aware so knows the topology of the
active Solr nodes and "does the right thing" via internal LBs.

Best,
Erick

On Tue, May 1, 2018 at 6:41 AM, Monica Skidmore
<Monica.Skidmore@careerbuilder.com> wrote:
> Thank you, Erick.  This is exactly the information I needed but hadn't correctly parsed
as a new Solr cloud user.  You've just made setting up our new configuration much easier!!
>
> Monica Skidmore
> Senior Software Engineer
>
>
>
> On 4/30/18, 7:29 PM, "Erick Erickson" <erickerickson@gmail.com> wrote:
>
>     "We need a way to determine that a node is still 'alive' and should be
>     in the load balancer, and we need a way to know that a new node is now
>     available and fully ready with its replicas to add to the load
>     balancer."
>
>     Why? If a Solr node is running but the replicas aren't up yet, it'll
>     pass the request along to a node that _does_ have live replicas, you
>     don't have to do anything. As far as the node being alive, there are
>     lots of ways, any API end point has to have a Solr to field it,
>     perhaps just use the Collections LIST command?
>
>     "How does ZooKeeper make this determination?  Does it do something
>     different if multiple collections are on a single cluster?  And, even
>     with just one cluster, what is best practice for keeping a current
>     list of active nodes in the cluster, especially for extremely high
>     query rates?"
>
>     This is a common misconception. ZooKeeper isn't interested in Solr at
>     all. ZooKeeper will ping the nodes it knows about and, perhaps, remove
>     a node from the live_nodes list, but that's all. It isn't involved in
>     Solr's operation in terms of routing queries, updates or anything like
>     that.
>
>     _Solr_ keeps track of all this by _watching_ various znodes. Say Solr
>     hosts some replica in a collection. when it comes up it sets a "watch"
>     on the /collections/my_collection/state.json Znode. It also published
>     its own state. So say it hosts three replicas for the collection. As
>     each one is loaded and ready for action, Solr posts an update to the
>     relevant state.json file.
>
>     ZooKeeper is then responsible for telling an other node who'd set a
>     watch that the znode has changed. ZK doesn't know or care whether
>     those are Solr nodes or not.
>
>     So when a request comes in to a Solr node, it knows what other Solr
>     nodes host what particular replicas and does all the sub-requests
>     itself, ZK isn't involved at all at that level.
>
>     So imagine node1 hosts S1R1 and S2R1 Node2 hosts S1R2 and S2R2 (for
>     collection A). When node1 comes up it updates the state in ZK to say
>     S1R2 and S1R2 are "active". Now claim node2 is coming up but hasn't
>     loaded it's cores yet. If it receives a request it can forward them on
>     to node1.
>
>     Now node2 loads both its cores. It updates the ZK node for the
>     collection, and since node1 is watching, it fetches the updated
>     state.json. From this point forward, both nodes have complete
>     information about all the replicas in the collection and don't need to
>     reference ZK any more at all.
>
>     In fact, ZK can completely go away and _queries_ can continue to work
>     off their cached state.json. Updates will fail since ZK quorums are
>     required for updates to indexes to prevent "split brain" problems.
>
>     Best,
>     Erick
>
>     On Mon, Apr 30, 2018 at 11:03 AM, Monica Skidmore
>     <Monica.Skidmore@careerbuilder.com> wrote:
>     > Thank you, Erick.  That confirms our understanding for a single cluster, or
once we select a node from one of the two clusters to query.
>     >
>     > As we try to set up an external load balancer to go between two clusters, though,
we still have some questions.  We need a way to determine that a node is still 'alive' and
should be in the load balancer, and we need a way to know that a new node is now available
and fully ready with its replicas to add to the load balancer.
>     >
>     > How does ZooKeeper make this determination?  Does it do something different
if multiple collections are on a single cluster?  And, even with just one cluster, what is
best practice for keeping a current list of active nodes in the cluster, especially for extremely
high query rates?
>     >
>     > Again, if there's some good documentation on this, I'd love a pointer...
>     >
>     > Monica Skidmore
>     > Senior Software Engineer
>     >
>     >
>     >
>     > On 4/30/18, 1:09 PM, "Erick Erickson" <erickerickson@gmail.com> wrote:
>     >
>     >     Multiple clusters with the same dataset aren't load-balanced by Solr,
>     >     you'll have to accomplish that from "outside", e.g. something that sends
>     >     queries to each cluster.
>     >
>     >     _Within_ a cluster (collection), as long as a request gets to any Solr
>     >     node, sub-requests are distributed with an internal software LB. As far
as
>     >     a single collection, you're fine just sending any query to any node. Even
>     >     if you send a query to a node that hosts no replicas for a collection, Solr
>     >     will "do the right thing" and forward it appropiately.
>     >
>     >     HTH,
>     >     Erick
>     >
>     >     On Mon, Apr 30, 2018 at 9:46 AM, Monica Skidmore <
>     >     Monica.Skidmore@careerbuilder.com> wrote:
>     >
>     >     > We are migrating from a master-slave configuration to Solr cloud (7.3)
and
>     >     > have questions about the preferred way to load balance between the
two
>     >     > clusters.  It looks like we want to use a load balancer that directs
>     >     > queries to any of the server nodes in either cluster, trusting that
node to
>     >     > handle the query correctly – true?  If we auto-scale nodes into the
>     >     > cluster, are there considerations about when a node becomes ‘ready’
to
>     >     > query from a Solr perspective and when it is added to the load balancer?
>     >     > Also, what is the preferred method of doing a health-check for the
load
>     >     > balancer – would it be “bin/solr healthcheck -c myCollection”?
>     >     >
>     >     >
>     >     >
>     >     > Pointers in the right direction – especially to any documentation
on
>     >     > running multiple clusters with the same dataset – would be appreciated.
>     >     >
>     >     >
>     >     >
>     >     > *Monica Skidmore*
>     >     > *Senior Software Engineer*
>     >     >
>     >     >
>     >     >
>     >     > [image: cid:image001.png@01D3A0F1.06327950]
>     >     >
>     >     >
>     >     >
>     >
>     >
>
>

Mime
View raw message