lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aswath Srinivasan (TMS)" <aswath.sriniva...@toyota.com>
Subject RE: Taking Solr to production
Date Fri, 22 Jan 2016 23:30:14 GMT
Thanks guys for all the responses.

True. What I wanted to convey is  2 shards with 4 replicas.

>> use more shards if the query latency is too high.

Shouldn't we go for more replicas if query latency is too high? You can go for more shard
if you have number of indexing documents and at a much frequent rate. Do you disagree with
my point of view?

There are no facets but complex queries exist. A safe bet is to have 2 shards is what I was
thinking so I give enough breathing space for the indexing jobs and 4 replicas to address
the high QPS request. Am I thinking correctly?

I cannot thank you enough you guys!!

Thank you,
Aswath NS


-----Original Message-----
From: Jack Krupansky [mailto:jack.krupansky@gmail.com]
Sent: Friday, January 22, 2016 3:06 PM
To: solr-user@lucene.apache.org
Subject: Re: Taking Solr to production

"1 Leader & 3 Replicas"

SolrCloud does not distinguish leaders from replicas - that's old master-slave terminology.
The leader is just one of the replicas.

So, are you really talking about 2 shards with 4 replicas each or 2 shards with 2 replicas
each?

Putting multiple replica instances on each machine isn't buying you anything, just making
it more complicated to manage.

Number of shards is determined by amount of data and whether query latency can be achieved
- use more shards if the query latency is too high.

2.5 million (2,500,000) documents is rather small, so unless your queries are running really
slow, it's not clear you even need sharding, but we don't know your document and query complexity.
Heavy faceting or complex function queries?

Number of replicas is determined by query load - number of simultaneous query requests, as
well as HA availability requirements.




-- Jack Krupansky

On Fri, Jan 22, 2016 at 5:45 PM, Toke Eskildsen
wrote:

> Aswath Srinivasan (TMS) wrote:
> > * Totally about 2.5 million documents to be indexed
> > * Documents average size is 512 KB - pdfs and htmls
>
> > This being said I was thinking I would take the Solr to production with,
> > * 2 shards, 1 Leader & 3 Replicas
>
> > Do you all think this set up will work? Will this server me 150 QPS?
>
> It certainly helps that you are batch updating. What is missing in
> this estimation is how large the documents are when indexed, as I
> guess the ½MB average is for the raw files? If they are your everyday
> short PDFs with images, meaning not a lot of text, handling 2M+ of
> them is easy. If they are all full-length books, it is another matter.
>
> Your document count is relatively low and if your index data end up
> being not-too-big (let's say 100GB), then you ought to consider having
> just a single shard with 4 replicas: There is a non-trivial overhead
> going from 1 shard to more than one, especially if you are doing faceting.
>
> - Toke Eskildsen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message