lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bram Van Dam <>
Subject Re: How large is your solr index?
Date Tue, 30 Dec 2014 08:19:09 GMT
On 12/29/2014 08:08 PM, ralph tice wrote:
> Like all things it really depends on your use case.  We have >160B
> documents in our largest SolrCloud and doing a *:* to get that count takes
> ~13-14 seconds.  Doing a text:happy query only takes ~3.5-3.6 seconds cold,
> subsequent queries for the same terms take <500ms.

That seems perfectly reasonable.

> Facets over high cardinality fields are going to be painful.  We currently
> programmatically limit the range to around 1/12th or 1/13th of the data set
> for facet queries, but plan on evaluating Heliosearch (initial results
> didn't look promising) and Toke's sparse faceting patch (SOLR-5894) to help
> out there.

We had a look at Heliosearch a while ago and found it unsuitable. Seems 
like they're trying to make use of some native x86_64 code and HotSpot 
JVM specific features which we can't use. Some of our clients use IBM's 
JVM so we're pretty much limited to strictly Java.

> There could be more support / ease of use enhancements for moving shards
> across SolrClouds, moving shards across physically nodes within a
> SolrCloud, and snapshot/restore of a SolrCloud, but there has also been a
> lot of recent work in these areas that are starting to provide the
> underlying infrastructure for more advanced shard management.

That's reassuring to hear. If we run in to these issues we can probably 
donate some time to work on them, so I'm not too worried about that.

> I think there are more people getting into the space of >100B documents but
> I only ran into or discovered a handful during my time at Lucene/Solr
> Revolution this November.  The majority of large scale SolrCloud users seem
> to have many collections (collections per logical user) rather than many
> documents in one/few collections.

That's my understanding as well. Lucene Revolution is on the wrong side 
of the Atlantic for me. But there's an Open Source Search devroom at 
FOSDEM this year, which seems like a sensible place to discuss these 
things. I'll make a post on the relevant mailing lists about this after 
the holidays if anyone is interested.

Thanks for your detailed response!

  - Bram

View raw message