lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: boosting instead of sorting WAS: to boost or not to boost
Date Thu, 21 Dec 2006 11:31:58 GMT
Martin Braun wrote:
> Hi Daniel,
>
>   
>>> so a doc from 1973 should get a boost of 1.1973 and a doc of 1975 should
>>> get a boost of 1.1975 .
>>>       
>> The boost is stored with a limited resolution. Try boosting one doc by 10, 
>> the other one by 20 or something like that.
>>     
>
> You're right. I thought that with the float values the resolution should
> be good enough!
> But there is only a difference in the score with a boosting diff of 0.2
> (e.g. 1.7 and 1.9).
>
> I know that there were many questions on the list regarding scoring
> better new documents.
> But I want to avoid any overhead like "FunctionQuery" at query time,
> and in my case I have some documents
> which have same values in many fields (=>same score) and the only
> difference is the year.
>
> However  I don't want to overboost the score so that the scoring for
> other criteria is not considered.
>
> Shortly spoken: As a result of a search I have a list of book titles and
> I want  a sort by score AND by year of publication.
>
> But for performance reasons I want to avoid this sorting at query-time
> by boosting at index time.
>
> Is that possible?
>   

Here's the trick that works for me, without the issues of boost 
resolution or FunctionQuery.

Add a separate field, say "days", in which you will put as many "1" as 
many days elapsed since the epoch (not neccessarily since 1 Jan 1970 - 
pick a date that makes sense for you). Then, if you want to prioritize 
newer documents, just add "+days:1" to your query. Voila - the final 
results are a sum of other score factors plus a score factor that is 
higher for more recent document, containing more 1-s.

If you are dealing with large time spans, you can split this into years 
and days-in-a-year, and apply query boosts, like "+years:1^10.0 
+days:1^0.02". Do some experiments and find what works best for you.


-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message