lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fuad Efendi <>
Subject SimpleFacets: Performance Boost for Tokenized Fields
Date Mon, 18 Aug 2008 15:10:44 GMT

Term Vectors could be much faster than Intersectings with FilterCache.
Exception: when size of DocSet is close (more than 50%) to the total  
count of documents in the index.

When it works (100 times faster than current; very specific scenario):
- use stored Term Vectors;
- 10,000,000 documents in the index;
- 5-10 terms per document;
- 200,000 unique terms for a tokenized field.

Obviously calculating sizes of 200,000 intersections with FilterCache  
is slover than traversing 10 - 20,000 documents for smaller DocSets  
and counting frequencies of Terms.

There are some related TODOs in SOLR source.


Fuad Efendi
Tokenizer Inc.

View raw message