lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Brown <...@intelcompute.com>
Subject Re: Querying only replica's
Date Sun, 10 Jan 2016 13:19:06 GMT
Thanks Erick,

For the health-checks on the load-balancer side, would you recommend a 
simple query, or is there a reliable ping or similar for this scenario?

Cheers,
Rob


On 09/01/16 23:44, Erick Erickson wrote:
> bq: is it best/good to get the CLUSTERSTATUS via the collection API
> and explicitly send queries to a replica to ensure I don't send
> queries to the leaders of my collection
>
> In a word _no_. SolrCloud is vastly different than the old
> master/slave. In SolrCloud, each and every node (leader and replicas)
> index all the docs and serve queries. The additional burden the leader
> has is actually very small. There's absolutely no reason to _not_ use
> the leader to serve queries.
>
> As far as sending updates, there would be a _little_ benefit to
> sending the updates directly to the leader, but _far_ more benefit in
> using SolrJ. If you use SolrJ (and CloudSolrClient), then the
> documents are split up on the _client_ and only the docs for a
> particular shard are automatically sent to the leader for that shard.
> Using SolrJ you can essentially scale indexing linearly with the
> number of shards you have. Just using HTTP does not scale linearly.
> Your particular app may not care, but in high-throughput situations
> this can be significant.
>
> So rather than spend time and effort sending updates directly to a
> leader and have the leader then forward the docs to the correct shard,
> I recommend investing the time in using SolrJ for updates rather than
> sending updates to the leader over HTTP. Or just ignore the problem
> and devote your efforts to something that are more valuable.
>
> So in short:
> 1> just stick a load balancer in front of _all_ your Solr nodes for
> queries. And note that there's an internal load balancer already in
> Solr that routes things around anyway, although putting a load
> balancer in front of your entire cluster makes it so there's not a
> single point of failure.
> 2> Depending on your throughput needs, either
> 2a> use SolrJ to index
> 2b> don't worry about it and send updates through the load balancer as
> well. There'll be an extra hop if you send updates to a replica, but
> if that's significant you should be using SolrJ
>
> As for 5.5, it's not at all clear that there _will_ be a 5.5. 5.4 was
> just released in early December. There's usually a several month lag
> between point releases and there's some agitation to start the 6.0
> release process, so it's up in the air.
>
>
> On Sat, Jan 9, 2016 at 12:04 PM, Robert Brown <rob@intelcompute.com> wrote:
>> Hi,
>>
>> (btw, when is 5.5 due?  I see the docs reference it, but not the download
>> page)
>>
>> Anyway, I index and query Solr over HTTP (no SolrJ, etc.) - is it best/good
>> to get the CLUSTERSTATUS via the collection API and explicitly send queries
>> to a replica to ensure I don't send queries to the leaders of my collection,
>> to improve performance?  Like-wise with sending updates directly to a
>> Leader?
>>
>> My leaders will receive full updates of the entire collection once a day, so
>> I would assume if the leader is handling queries too, performance would be
>> hit?
>>
>> Is the CLUSTERSTATUS API the only way to do this btw without SolrJ, etc.?  I
>> wasn't sure if ZooKeeper would be able to tell me also.
>>
>> Do I also need to do anything to ensure the leaders are never sent queries
>> from the replica's?
>>
>> Does this all sound sane?
>>
>> One of my collections is 3 shards, with 2 replica's each (9 total nodes),
>> 70m docs in total.
>>
>> Thanks,
>> Rob
>>


Mime
View raw message