lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chun Wei Ho <>
Subject Help: tweaking search - reducing IDF skew and implementing score cutoff
Date Fri, 10 Feb 2006 07:25:49 GMT

I am running a search for something akin to a news site, when each
news document has a date, title, keywords/bylines, summary fields and
then the actual content. Using Lucene for this database of documents,
it seems that:

1. The relevancy score is skewed drastically by the actual number of
news document per day. For example if I RangedQuery on this week, and
there were 100 news on Monday and two news on Sunday, the two on
Sunday gets ranked highly due to idf. How do I reduce this skewness
due to the date-posted field? I saw a reference earlier to
ConstantScoreRangeQuery on JIRA - is it the solution?

2. If I choose to sort the results by date, then recent documents with
very very low relevancy (say the words searched appears only in
content, and not in title/bylines/summary fields that are boosted
higher) are still shown relatively high in the list, and I wish to
omit them in general. What is the best way to implement some sort of a
relevancy filter (include only documents with an normalized score of
0.2 or more....)? Or is there a better way around it?

Thanks :)

Best Regards,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message