lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject RE: SolrCloud performance issues regarding hardware configuration
Date Fri, 18 Jul 2014 22:06:30 GMT
search engn dev [sachinyadav0025@gmail.com] wrote:
> out of 700 million documents 95-97% values are unique approx.

That's quite a lot. If you are not already using DocValues for that, you should do so.

So, each shard handles ~175M documents. Even with DocValues, there is an overhead of just
having the base facet structure in memory, but that should be doable with 8GB of heap. It
helps if your field is single-value and marked as such in the schema. If your indexes are
rarely updated, you can consider optimizing down to a single segment: Single-value-field +
DocValues + single-segment = very low memory overhead for the base facet structures.

However, As Eric states, each call temporarily allocates ~175M buckets aka ints, which means
700MB of heap allocation. When you get the call to work, you should limit the number of concurrent
searches severely. Remember to do it outside of the Solr Cloud, to avoid dead-locks.

Performance of high-cardinality faceting with stock Solr is not great (but not that bad either,
considering what it does), but if you really need it you can work around some of the issues.
SOLR-5894 (https://issues.apache.org/jira/browse/SOLR-5894 - I am the author) makes it possible
to get the results faster for small result sets and to limit the amount of memory used by
the buckets: If the maximum possible count for any term in your user_digest field is 20 (just
an example), the buckets need only be 6 bits each (2^6=32 > 20) and the temporary memory
allocation for each call will be 175M * 6bit / 8 bits/byte = 132MByte. Read http://sbdevel.wordpress.com/2014/04/04/sparse-facet-counting-without-the-downsides/
for a detailed description.

- Toke Eskildsen

Mime
View raw message