lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lamprecht <clampre...@gmail.com>
Subject Re: Help: tweaking search - reducing IDF skew and implementing score cutoff
Date Fri, 10 Feb 2006 08:08:33 GMT
> 2. If I choose to sort the results by date, then recent documents with
> very very low relevancy (say the words searched appears only in
> content, and not in title/bylines/summary fields that are boosted
> higher) are still shown relatively high in the list, and I wish to
> omit them in general. What is the best way to implement some sort of a
> relevancy filter (include only documents with an normalized score of
> 0.2 or more....)? Or is there a better way around it?

As Chris pointed out, there isn't always an easy way to do this.  Your
suggestion of filtering below normalized scores of 0.2 might work,
assuming the most relevant document is 1.0.  You'll have to tune this
cutoff point and see how well it works.  One thing to watch out for is
that if the raw (non-normalized) score is less than 1.0, it is not
"normalized", so your most relevant document can have a score of less
than 1.0.  This may or may not be what you want, just something to
consider.  Lucene's Hits.java is where the normalization happens.

-chris

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message