lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wun...@wunderwood.org>
Subject Re: Taking Solr to production
Date Fri, 22 Jan 2016 23:06:24 GMT
I agree, sharding may hurt more than it helps. And estimate the text size after the documents
are processed.

We all love Solr Cloud, but this could be a good application for traditional master/slave
Solr. That means no Zookeeper nodes and it is really easy to add a new query slave, just clone
the instance.

We run an index with homework questions which seems similar to yours.

* 7 million documents.
* 50 Gbyte index.
* Request rates of 5000 to 10,000 q/minute per server.
* No facets or highlighting (highlighting soon, we store term vectors).
* Amazon EC2 instances with 16 cores, 30 Gbytes RAM, index is in ephemeral SSD.
* Index updates once per day.
* Master/slave.
* Solr 4.10.4.

During peak traffic, the 95th percentile response time was about three seconds, but that is
because the queries are entire homework questions, up to 1000 words, pasted into the query
box. Yes, we have very unusual queries. Median response time was much better, about 50 milliseconds.

wunder
Walter Underwood
wunder@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jan 22, 2016, at 2:45 PM, Toke Eskildsen <te@statsbiblioteket.dk> wrote:
> 
> Aswath Srinivasan (TMS) <aswath.srinivasan@toyota.com> wrote:
>> *         Totally about 2.5 million documents to  be indexed
>> *         Documents average size is 512 KB - pdfs and htmls
> 
>> This being said I was thinking I would take the Solr to production with,
>> *         2 shards, 1 Leader & 3 Replicas
> 
>> Do you all think this set up will work? Will this server me 150 QPS?
> 
> It certainly helps that you are batch updating. What is missing in this estimation is
how large the documents are when indexed, as I guess the ½MB average is for the raw files?
If they are your everyday short PDFs with images, meaning not a lot of text, handling 2M+
of them is easy. If they are all full-length books, it is another matter.
> 
> Your document count is relatively low and if your index data end up being not-too-big
(let's say 100GB), then you ought to consider having just a single shard with 4 replicas:
There is a non-trivial overhead going from 1 shard to more than one, especially if you are
doing faceting.
> 
> - Toke Eskildsen


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message