lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject RE: (Issue) How improve solr facet performance
Date Sat, 24 May 2014 07:17:20 GMT
Alice.H.Yang (mis.cnsh04.Newegg) 41493 [Alice.H.Yang@newegg.com] wrote:
> 1.  I'm sorry, I have made a mistake, the total number of documents is 32 Million, not
320 Million.
> 2.  The system memory is large for solr index, OS total has 256G, I set the solr tomcat
HEAPSIZE="-Xms25G -Xmx100G"

100G is a very high number. What special requirements dictates such a large heap size?

> Reply:  9 fields I facet on.

Solr treats each facet separately and with facet.method=fc and 10M hits, this means that it
will iterate 9*10M = 90M document IDs and update the counters for those.

> Reply:  3 facet fields have one hundred unique values, other 6 facet fields' unique values
are between 3 to 15.

So very low cardinality. This is confirmed by your low response time of 6ms for 2925 hits.

> And we test this scenario:  If the number of facet fields' unique values is less we add
facet.method=enum, there is a little to improve performance.

That is a shame: enum is normally the simple answer to a setup like yours. Have you tried
fine-tuning your fc/enum selection, so that the 3 fields with hundreds of values uses fc and
the rest uses enum? That might halve your response time.


Since the number of unique facets is so low, I do not think that DocValues can help you here.
Besides the fine-grained fc/enum-selection above, you could try collapsing all 9 facet-fields
into a single field. The idea behind this is that for facet.method=fc, performing faceting
on a field with (for example) 300 unique values takes practically the same amount of time
as faceting on a field with 1000 unique values: Faceting on a single slightly larger field
is much faster than faceting on 9 smaller fields. After faceting with facet.limit=-1 on the
single super-facet-field, you must match the returned values back to their original fields:


If you have the facet-fields

field0: 34
field1: 187
field2: 78432
field3: 3
...

then collapse them by or-ing a field-specific mask that is bigger than the max in any field,
then put it all into a single field:

fieldAll: 0xA0000000 | 34
fieldAll: 0xA1000000 | 187
fieldAll: 0xA2000000 | 78432
fieldAll: 0xA3000000 | 3
...

perform the facet request on fieldAll with facet.limit=-1 and split the resulting counts with

for (entry: facetResultAll) {
  switch (0xFF000000 & entry.value) {
    case 0xA0000000:
      field0.add(entry.value, entry.count);
      break;
    case 0xA1000000:
      field1.add(entry.value, entry.count);
      break;
...
  }
}


Regards,
Toke Eskildsen, State and University Library, Denmark

Mime
View raw message