lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Klaas <>
Subject Re: Intuition check
Date Wed, 14 Nov 2007 05:42:56 GMT

On 8-Nov-07, at 4:34 PM, Chris Hostetter wrote:

> : First, how to determine whether the filter-embedding would be  
> effective?  We
> 	...
> : really available.  It can be estimated assuming the filter and  
> query are
> : independent, but this definitely isn't always true.  If the filter
> I was assuming we could use a simple hueristic...
>     if( configOption < docSet.size()/numDocs() )

Another case that comes to mind is if the matching query is a  
MatchAllDocsQuery, in which case the filter should probably be used  

> : Second, embedding the filter itself.  This is much more  
> nettlesome within
> : SolrIndexSearcher than within one of the request handlers.  One  
> problem is the
> really?  why should it be?

Sorry, that sentence was the product of thinking-while-responding,  
which is always a recipe for being wrong <g>.  I had a particular  
query structure in mind, one that had the matching clauses embedded  
in the inner "core" of the query with several layers of score  
modification queries wrapped on top of this (e.g. dismax's various  
boost queries; yonik's multiplicative boost queries).  I was  
imagining that it was necessary to embed the filter clauses in the  
"core" to produce an effective implementation.  By the time I  
finished my response, I had read enough of the relevant Lucene scorer  
code (in particular, ReqOptScorer) to realize that the benefits would  
be had using an outer-layer ConjunctionQuery as well.

> anything the request handler can do to much with the Query object
> SolrIndexSearcher can do as well .. and by the time
> getDocListNC/getDocListAndSetNC are called the "pure negative"  
> issues are
> alrady resolved.
> The only difference is that in those methods we already have a DocSet
> (instead of a Query) but it should be easy to wrap a DocSet in a  
> Query to
> add to the main query.
> : ISTM then that the main challenge is in determining when the filter
> : intersection should be embedded.  Also, the ability to control  
> filter caching
> : is still difficult with this implementation, but perhaps that's less
> : important.
> yeah ... it seems like there are two orthoginal use cases...
>   1) "here is an 'fq', i know it's not worth caching" ... in which we
>      don't put it in the filterCache.
>   2) "here is an 'fq'" ... in which we get the DocSet and add it to  
> the
>      main query if it's small.
> for any given input, 1 and 2 might both apply, or just 1, or just 2

True.  I'm tempted to implement a <!nocache> directive via embedding  
(without advertising the fact), and work on the fq optimization  


View raw message