lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fuad Efendi <f...@efendi.ca>
Subject SimpleFacets: Performance Boost for Tokenized Fields
Date Mon, 18 Aug 2008 15:10:44 GMT
Hello:


Term Vectors could be much faster than Intersectings with FilterCache.
Exception: when size of DocSet is close (more than 50%) to the total  
count of documents in the index.

When it works (100 times faster than current; very specific scenario):
- use stored Term Vectors;
- 10,000,000 documents in the index;
- 5-10 terms per document;
- 200,000 unique terms for a tokenized field.

Obviously calculating sizes of 200,000 intersections with FilterCache  
is slover than traversing 10 - 20,000 documents for smaller DocSets  
and counting frequencies of Terms.


There are some related TODOs in SOLR source.


-- 
Thanks,

Fuad Efendi
416-993-2060(cell)
Tokenizer Inc.
==============
http://www.linkedin.com/in/liferay
http://www.tokenizer.org






Mime
View raw message