lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw00d <markharw...@yahoo.co.uk>
Subject Re: regex-based query contribution
Date Thu, 13 Oct 2005 18:15:07 GMT
Sounds like a very useful addition but as yet another variant of "term 
expanding" queries (fuzzy/prefix/range/wildcard) now might be a good 
time to re-raise the scoring issue I originally identified here with all 
such queries: http://issues.apache.org/jira/browse/LUCENE-329

The issue is that "automagically" expanded terms are rewritten to a 
standard boolean query and because of the default IDF factor behaviour, 
rarer (often misspelt) terms are favoured over more common ones.
I don't imagine this is desirable behaviour for anyone.

I did provide an implementation that addressed this by ensuring all 
generated terms in the boolean query used the same IDF. The search 
results I posted showed a clear improvement. Unfortunately this was not 
rolled into core, however the other auto-expanding issue I raised on 
this JIRA bug to do with coords was addressed by adding disableCoord to 
BooleanQuery.
Since this time a lot of work has gone into BooleanQuery scoring, not 
all of it committed, so I'm not sure how best to address this concern or 
what code to extend/modify.
Anyone (Paul?) have any suggestions?


Cheers,
Mark




		
___________________________________________________________ 
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre.
http://uk.security.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message