lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cristh <>
Subject Choosing boosting in Lucene
Date Sat, 16 Apr 2011 15:39:52 GMT

I have a few questions about boosting in Lucene. I am running a research
project where I have, for each document, 4 fields: f1, f2, f3, f4. I also
have a set of queries for my corpus, and I know the relevant documents for
each of these queries. What I want to study is how boosting affects the
search results of these queries. Basically, I want to show that by boosting
some of these fields the results are better (I hope).
I have, though, a few essential questions that I cannot figure out and I
would really appreciate some help...

1. Is there any difference between boosting the fields at index time and
boosting the terms in the queries which appear in these fields at search
Again, I know beforehand the set of queries and also the terms in these
queries which appear in the documents in the corpus in each of the fields.

2. In what range are boosting values usually chosen? I.e., should I choose
boosts in a 0.5-2 range (say 0.5, 1, 1.5, 2), like I have seen in soem
examples, or is it the same if I choose boosts in a range like 50-200
(respectively 50, 100, 150, 200)? 

3. How sensitive is boosting in Lucene? For example, if I know approximately
the importance of each field, and I want to assign boosting values
accordingly, what would be good differences between the values of the
boosting factor for the different fields? More precisely, if the importance
order is f1<f2<f3<f4, will it matter if I choose the boosts as (1,2,3,4), or
(1, 5, 10, 15)?

4. Is there any method besides trial and error for finding the boosts for
each field that work the best for a particular corpus? 

Thank you very much,

View this message in context:
Sent from the Lucene - General mailing list archive at

View raw message