lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: facet.method: enum vs. fc
Date Mon, 11 Oct 2010 17:30:29 GMT
Yep, that was probably the best choice....

It's a classic time/space tradeoff. The enum method creates a bitset for
#each#
unique facet value. The bit set is (maxdocs / 8) bytes in size (I'm ignoring
some overhead here). So if your facet field has 10 unique values, and 8M
documents,
you'll use up 10M bytes or so. 20 unique values will use up 20M bytes and so
on. But
this is very, very fast.

fc on the other hand, eats up cache for storing the string value for each
unique value,
plus various counter arrays (several bytes/doc). For most cases, it will use
less memory
than enum, but will be slower.

I'd stick with fc for the time being and think about enum if 1> you have a
good idea of
what the number of unique terms is or 2> you start to need to finely tune
your speed.

HTH
Erick

On Mon, Oct 11, 2010 at 11:30 AM, Paolo Castagna <
castagna.lists@googlemail.com> wrote:

> Hi,
> I am using Solr v1.4 and I am not sure which facet.method I should use.
>
> What should I use if I do not know in advance if the number of values
> for a given field will be high or low?
>
> What are the pros/cons of using facet.method=enum vs. facet.method=fc?
>
> When should I use enum vs. fc?
>
> I have found some comments and suggestions here:
>
>  "enum enumerates all terms in a field, calculating the set intersection
>  of documents that match the term with documents that match the query.
>  This was the default (and only) method for faceting multi-valued fields
>  prior to Solr 1.4.
>  "fc (stands for field cache), the facet counts are calculated by
>  iterating over documents that match the query and summing the terms
>  that appear in each document. This was the default method for single
>  valued fields prior to Solr 1.4.
>  The default value is fc (except for BoolField) since it tends to use
>  less memory and is faster when a field has many unique terms in the
>  index."
>  -- http://wiki.apache.org/solr/SimpleFacetParameters#facet.method
>
>  "facet.method=enum [...] this is excellent for fields where there is
>  a small set of distinct values. The average number of values per
>  document does not matter.
>  facet.method=fc [...] this is excellent for situations where the
>  number of indexed values for the field is high, but the number of
>  values per document is low. For multi-valued fields, a hybrid approach
>  is used that uses term filters from the filterCache for terms that
>  match many documents."
>  -- http://wiki.apache.org/solr/SolrFacetingOverview
>
>  "If you are faceting on a field that you know only has a small number
>  of values (say less than 50), then it is advisable to explicitly set
>  this to enum. When faceting on multiple fields, remember to set this
>  for the specific fields desired and not universally for all facets.
>  The request handler configuration is a good place to put this."
>  -- Book: "Solr 1.4 Enterprise Search Server", pag. 148
>
> This is the part of the Solr code which deals with the facet.method
> parameter:
>
>  if (enumMethod) {
>    counts = getFacetTermEnumCounts([...]);
>  } else {
>    if (multiToken) {
>      UnInvertedField uif = [...]
>      counts = uif.getCounts([...]);
>    } else {
>      [...]
>      if (per_segment) {
>        [...]
>        counts = ps.getFacetCounts([...]);
>      } else {
>        counts = getFieldCacheCounts([...]);
>      }
>    }
>  }
>  --
> https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/java/org/apache/solr/request/SimpleFacets.java
>
> See also:
>
>  -
> http://stackoverflow.com/questions/2902680/how-well-does-solr-scale-over-large-number-of-facet-values
>
> At the end, since I do not know in advance the number of different
> values for my fields I went for facet.method=fc, does this seems
> reasonable to you?
>
> Thank you,
> Paolo
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message