lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: regex-based query contribution
Date Thu, 13 Oct 2005 20:15:36 GMT
On Thursday 13 October 2005 20:15, markharw00d wrote:
> Sounds like a very useful addition but as yet another variant of "term 
> expanding" queries (fuzzy/prefix/range/wildcard) now might be a good 
> time to re-raise the scoring issue I originally identified here with all 
> such queries: http://issues.apache.org/jira/browse/LUCENE-329
> 
> The issue is that "automagically" expanded terms are rewritten to a 
> standard boolean query and because of the default IDF factor behaviour, 
> rarer (often misspelt) terms are favoured over more common ones.
> I don't imagine this is desirable behaviour for anyone.
> 
> I did provide an implementation that addressed this by ensuring all 
> generated terms in the boolean query used the same IDF. The search 
> results I posted showed a clear improvement. Unfortunately this was not 
> rolled into core, however the other auto-expanding issue I raised on 
> this JIRA bug to do with coords was addressed by adding disableCoord to 
> BooleanQuery.
> Since this time a lot of work has gone into BooleanQuery scoring, not 
> all of it committed, so I'm not sure how best to address this concern or 
> what code to extend/modify.
> Anyone (Paul?) have any suggestions?

These expanding queries result in a disjunction over expanded terms,
so make them rewrite to a new query, for example ExpandedTermsQuery.
Then let the Weight normalisation of ExpandedTermsQuery do the work of
"flattening" the idf, as in your implementation, and make this Weight 
provide a DisjunctionSumScorer over the reweighted terms.
This DisjunctionSumScorer does not use any further weighting or coordination.

A more general solution would be to use a subclass of BooleanQuery that
provides a Weight that flattens all the weights of the subqueries, for example
to the maximum weight, and for the rest works like the usual Weight of
BooleanQuery.

The choice between these two depends on how special the flattening
mechanism is wrt. to terms. Could it be generalized to any subquery
of BooleanQuery?

Regards,
Paul Elschot


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message