On 8-Nov-07, at 4:34 PM, Chris Hostetter wrote:
>
> : First, how to determine whether the filter-embedding would be
> effective? We
> ...
> : really available. It can be estimated assuming the filter and
> query are
> : independent, but this definitely isn't always true. If the filter
>
> I was assuming we could use a simple hueristic...
> if( configOption < docSet.size()/numDocs() )
Another case that comes to mind is if the matching query is a
MatchAllDocsQuery, in which case the filter should probably be used
directly.
> : Second, embedding the filter itself. This is much more
> nettlesome within
> : SolrIndexSearcher than within one of the request handlers. One
> problem is the
>
> really? why should it be?
Sorry, that sentence was the product of thinking-while-responding,
which is always a recipe for being wrong <g>. I had a particular
query structure in mind, one that had the matching clauses embedded
in the inner "core" of the query with several layers of score
modification queries wrapped on top of this (e.g. dismax's various
boost queries; yonik's multiplicative boost queries). I was
imagining that it was necessary to embed the filter clauses in the
"core" to produce an effective implementation. By the time I
finished my response, I had read enough of the relevant Lucene scorer
code (in particular, ReqOptScorer) to realize that the benefits would
be had using an outer-layer ConjunctionQuery as well.
> anything the request handler can do to much with the Query object
> SolrIndexSearcher can do as well .. and by the time
> getDocListNC/getDocListAndSetNC are called the "pure negative"
> issues are
> alrady resolved.
>
> The only difference is that in those methods we already have a DocSet
> (instead of a Query) but it should be easy to wrap a DocSet in a
> Query to
> add to the main query.
>
> : ISTM then that the main challenge is in determining when the filter
> : intersection should be embedded. Also, the ability to control
> filter caching
> : is still difficult with this implementation, but perhaps that's less
> : important.
>
> yeah ... it seems like there are two orthoginal use cases...
> 1) "here is an 'fq', i know it's not worth caching" ... in which we
> don't put it in the filterCache.
> 2) "here is an 'fq'" ... in which we get the DocSet and add it to
> the
> main query if it's small.
>
> for any given input, 1 and 2 might both apply, or just 1, or just 2
True. I'm tempted to implement a <!nocache> directive via embedding
(without advertising the fact), and work on the fq optimization
separately.
Thanks,
-Mike
|