lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ralph tice <ralph.t...@gmail.com>
Subject Re: How large is your solr index?
Date Mon, 29 Dec 2014 19:08:59 GMT
Like all things it really depends on your use case.  We have >160B
documents in our largest SolrCloud and doing a *:* to get that count takes
~13-14 seconds.  Doing a text:happy query only takes ~3.5-3.6 seconds cold,
subsequent queries for the same terms take <500ms.  We have a little over
3TB of RAM in the cluster which is around 1/10th size on disk which are
fast SSDs (rated 300K IOPS per machine), but more importantly we are using
12-13 large machines rather than dozens or hundreds of small machines, and
if your use case is primarily full text search you probably could get away
with even fewer machines depending on query patterns.  We run several JVMs
per machine and many shards per JVM, but are careful to order shards so
that queries get dispersed across multiple JVMs across multiple machines
wherever possible.

Facets over high cardinality fields are going to be painful.  We currently
programmatically limit the range to around 1/12th or 1/13th of the data set
for facet queries, but plan on evaluating Heliosearch (initial results
didn't look promising) and Toke's sparse faceting patch (SOLR-5894) to help
out there.

If any given JVM goes OOM that also becomes a rough time operationally.  If
your indexing rate spikes past what your sharding strategy can handle, that
sucks too.

There could be more support / ease of use enhancements for moving shards
across SolrClouds, moving shards across physically nodes within a
SolrCloud, and snapshot/restore of a SolrCloud, but there has also been a
lot of recent work in these areas that are starting to provide the
underlying infrastructure for more advanced shard management.

I think there are more people getting into the space of >100B documents but
I only ran into or discovered a handful during my time at Lucene/Solr
Revolution this November.  The majority of large scale SolrCloud users seem
to have many collections (collections per logical user) rather than many
documents in one/few collections.

Regards,
--Ralph

On Mon Dec 29 2014 at 11:55:41 AM Erick Erickson <erickerickson@gmail.com>
wrote:

> When you say 2B docs on a single Solr instance, are you talking only one
> shard?
> Because if you are, you're very close to the absolute upper limit of a
> shard, internally
> the doc id is an int or 2^31. 2^31 + 1 will cause all sorts of problems.
>
> But yeah, your 100B documents are going to use up a lot of servers...
>
> Best,
> Erick
>
> On Mon, Dec 29, 2014 at 7:24 AM, Bram Van Dam <bram.vandam@intix.eu>
> wrote:
> > Hi folks,
> >
> > I'm trying to get a feel of how large Solr can grow without slowing down
> too
> > much. We're looking into a use-case with up to 100 billion documents
> > (SolrCloud), and we're a little afraid that we'll end up requiring 100
> > servers to pull it off.
> >
> > The largest index we currently have is ~2billion documents in a single
> Solr
> > instance. Documents are smallish (5k each) and we have ~50 fields in the
> > schema, with an index size of about 2TB. Performance is mostly OK. Cold
> > searchers take a while, but most queries are alright after warming up. I
> > wish I could provide more statistics, but I only have very limited
> access to
> > the data (...banks...).
> >
> > I'd very grateful to anyone sharing statistics, especially on the larger
> end
> > of the spectrum -- with or without SolrCloud.
> >
> > Thanks,
> >
> >  - Bram
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message