lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Help: tweaking search - reducing IDF skew and implementing score cutoff
Date Fri, 10 Feb 2006 07:56:21 GMT
: Sunday gets ranked highly due to idf. How do I reduce this skewness
: due to the date-posted field? I saw a reference earlier to
: ConstantScoreRangeQuery on JIRA - is it the solution?

Yes.  RangeQuery expands to a BooleanQuery containing all of the terms in
the.  The number of terms (and the frequency of thsoe terms in the index)
will allways affect those scores.  This is why i constantly argue that
when using dates or numbers a RangeQuery never makes sense -- allways use
a RangeFilter, and if you must have a "Query" object, use
ConstantScoreRangeQuery.

: 2. If I choose to sort the results by date, then recent documents with
: very very low relevancy (say the words searched appears only in
: content, and not in title/bylines/summary fields that are boosted
: higher) are still shown relatively high in the list, and I wish to
: omit them in general. What is the best way to implement some sort of a
: relevancy filter (include only documents with an normalized score of
: 0.2 or more....)? Or is there a better way around it?

there is no safe way to filter by score, this is mentioned in the FAQ...

http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-912c1f237bb00259185353182948e5935f0c2f03

An alternate approach is to sort by score, but use something like a
FunctionQuery to inflate the scores of more recent documents...

https://issues.apache.org/jira/browse/LUCENE-446




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message