lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Facet search
Date Thu, 24 Feb 2011 01:59:13 GMT

: This is another indicator that we should really try to extract Solr's
: capabilities like Faceting into modules! Solr should not be required
: if you want to use the facteing stuff we already have.

the most basic logic of (field) faceting used by solr is simple TermEnum 
iteration and document set intersection.  Any Lucene application can do 
that w/o really refactoring any code out of Solr.  it's very straight 
forward.

The real value adds that solr provides are:

 * DocSet caching and cache warming which solr can do for you because it 
knows when index changes (because it manages all the writes and reader 
reopening).  
 * select alternate facet algorithms based on schema knowledge -- looking 
at field types and value cardinality to determine when FieldCache or 
UnInvertedField would be more efficient then TermEnumeration and DocSets
 * acurate counts when doing distributed searching

This aren't things that seem like they could really be extracted in a very 
reusable manner -- the pre-requisets and scaffolding you'd need to 
setup and use these pieces in a meaningful way outside of solr would 
probably wind up being just like solr.

There are however lots of pieces that oculd be extracted and reused -- but 
those things have allready been started/discussed (DocSets, hooks for 
generic caches that are notified when IndexReaders are reopened, or 
segments are changed, multivalue support in FieldCache, etc...)

: >> I am using Lucene for my project and we have new requirement  to present
: >> data in the form of Analytics. Facet could be used for that but for this

thats kind of a vague requirement -- if you can elaborate a bit on what 
types of info you actaully want to compute/return, there may be a very 
straightforward way to do it.  

like i said: the basics of faceting over all terms in a field is *really* 
trivial ... the original implementation in Solr was about 40 lines of 
code...

http://svn.apache.org/viewvc/incubator/solr/trunk/src/java/org/apache/solr/request/SimpleFacets.java?view=markup&pathrev=441175#l163


-Hoss

Mime
View raw message