lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Taking Solr to production
Date Sat, 23 Jan 2016 02:20:56 GMT
It boils down to whether the response rate when you query a single
shard is "acceptable", plus some overhead for sharding.

So, if you need 100QPS and all you can get after tuning on a single
shard (which you can test with &distrib=false)
is 10QPS, you need 10 replicas.

But if a single shard can only get you responses back in 10 seconds,
you need more shards.

And so on....


On Fri, Jan 22, 2016 at 3:30 PM, Aswath Srinivasan (TMS)
<> wrote:
> Thanks guys for all the responses.
> True. What I wanted to convey is  2 shards with 4 replicas.
>>> use more shards if the query latency is too high.
> Shouldn't we go for more replicas if query latency is too high? You can go for more shard
if you have number of indexing documents and at a much frequent rate. Do you disagree with
my point of view?
> There are no facets but complex queries exist. A safe bet is to have 2 shards is what
I was thinking so I give enough breathing space for the indexing jobs and 4 replicas to address
the high QPS request. Am I thinking correctly?
> I cannot thank you enough you guys!!
> Thank you,
> Aswath NS
> -----Original Message-----
> From: Jack Krupansky []
> Sent: Friday, January 22, 2016 3:06 PM
> To:
> Subject: Re: Taking Solr to production
> "1 Leader & 3 Replicas"
> SolrCloud does not distinguish leaders from replicas - that's old master-slave terminology.
The leader is just one of the replicas.
> So, are you really talking about 2 shards with 4 replicas each or 2 shards with 2 replicas
> Putting multiple replica instances on each machine isn't buying you anything, just making
it more complicated to manage.
> Number of shards is determined by amount of data and whether query latency can be achieved
- use more shards if the query latency is too high.
> 2.5 million (2,500,000) documents is rather small, so unless your queries are running
really slow, it's not clear you even need sharding, but we don't know your document and query
complexity. Heavy faceting or complex function queries?
> Number of replicas is determined by query load - number of simultaneous query requests,
as well as HA availability requirements.
> -- Jack Krupansky
> On Fri, Jan 22, 2016 at 5:45 PM, Toke Eskildsen
> wrote:
>> Aswath Srinivasan (TMS) wrote:
>> > * Totally about 2.5 million documents to be indexed
>> > * Documents average size is 512 KB - pdfs and htmls
>> > This being said I was thinking I would take the Solr to production with,
>> > * 2 shards, 1 Leader & 3 Replicas
>> > Do you all think this set up will work? Will this server me 150 QPS?
>> It certainly helps that you are batch updating. What is missing in
>> this estimation is how large the documents are when indexed, as I
>> guess the ½MB average is for the raw files? If they are your everyday
>> short PDFs with images, meaning not a lot of text, handling 2M+ of
>> them is easy. If they are all full-length books, it is another matter.
>> Your document count is relatively low and if your index data end up
>> being not-too-big (let's say 100GB), then you ought to consider having
>> just a single shard with 4 replicas: There is a non-trivial overhead
>> going from 1 shard to more than one, especially if you are doing faceting.
>> - Toke Eskildsen

View raw message