lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject RE: How large is your solr index?
Date Mon, 29 Dec 2014 21:30:25 GMT
Bram Van Dam [bram.vandam@intix.eu] wrote:
> I'm trying to get a feel of how large Solr can grow without slowing down
> too much. We're looking into a use-case with up to 100 billion documents
> (SolrCloud), and we're a little afraid that we'll end up requiring 100
> servers to pull it off.

One recurring theme on this list is that it is very hard to compare indexes. Even if the data
structure happens to be the same, performance will very drastically depending on the types
of queries and the processing requested. That being said, I acknowledge that it helps with
stories to get a feel of what can be done.

One second caveat is that I find it an exercise in futility to talk about scale without an
idea of expected response times as well as the expected number of concurrent users. If you
are just doing some nightly batch processing, you could probably run your (scaling up from
your description) 100TB index off spinning drives on a couple of boxes. If you expect to be
hammered with millions of requests per day, you would have to put a zero or two behind that
number.

End of sermon.

At Lucene/Solr Revolution 2014, Grant Ingersoll also asked for user stories and pointed to
https://wiki.apache.org/solr/SolrUseCases - sadly it has not caught on. The only entry is
for our (State and University Library, Denmark) setup with 21TB / 7 billion documents on a
single machine. To follow my own advice, I can elaborate that we have 1-3 concurrent users
and a design goal of median response times below 2 seconds for faceted search. I guess that
is at the larger end at the spectrum for pure size, but at the very low end for usage.

- Toke Eskildsen

Mime
View raw message