lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Ludington" <gluding...@gmail.com>
Subject Faceted Browsing Question/Discussion
Date Wed, 19 Jul 2006 21:00:41 GMT
I have implemented faceted browsing in prototype I have been working
on with Solr, but I would like to ask some more experienced hands
about performance implications.  Currently,  I calculate the count of
a given facet as follows:

       DocSet valueDocSet = req.getSearcher().getDocSet(item.getQuery());
       long count = valueDocSet.intersectionSize(results);

Is this the preferred way to obtain such a count, or ithere another
way, such as dealing directly with BitSets (something I avoided, since
it appears getBits() is deprecated in the DocSet interface)?
Similarly, since this method is commented as "cache-aware", does that
mean that the item itself does not need to worry about caching its
results, only its terms, since the results will end up in the
queryResultCache?  Or is this assumption incorrect, and should each
facet/item be concerned with caching its results as well?

Apologies for sending this to solr-dev, and not solr-user, but I
thought this might also segue into a discussion on faceted browsing in
general.  To that end, my current structure defines:

- a <facetHandler/> entry in solrconfig.xml, the only current
implementation of which loads a set of Facet definitions from an xml
file.
- each Facet contains an id for lookups and a List of FacetItems (some
statically configured, some constructed dynamically from available
Terms, though not backed by any cache yet.)
- each FacetItem contains a displayName and Query (and associated queryString)

Adding these parameters to the query, then a request with these parameters:
&ft=xmlfacets&f=man&f=instock

Would use the facetHandler "xmlfacets" to add this to the results:

<lst name="facets">
 <arr name="man">
 <lst>
       <str name="fq">manu_exact:"ASUS Computer Inc."</str>
       <long name="count">0</long>
       <str name="displayName">ASUS Computer Inc.</str>
 </lst>
 <lst>
       <str name="fq">manu_exact:"ATI Technologies"</str>
       <long name="count">0</long>
       <str name="displayName">ATI Technologies</str>
 </lst>
 <lst>
       <str name="fq">manu_exact:"Dell, Inc."</str>
       <long name="count">1</long>
       <str name="displayName">Dell, Inc.</str>
 </lst>
 </arr>
 <arr name="instock">
 <lst>
       <str name="fq">inStock:true</str>
       <long name="count">1</long>
       <str name="displayName">In Stock</str>
 </lst>
 <lst>
       <str name="fq">inStock:false</str>
       <long name="count">0</long>
       <str name="displayName">Out of Stock</str>
 </lst>
 </arr>
</lst>

The basic handling and output format work for my prototype's purposes,
but I have not delved deeply into caching at this time. Does this
setup seem appropriate, and the abovementioned caching assumption seem
valid, or have I missed something that would help support facets on a
larger scale?

Thanks,
Greg

Mime
View raw message