lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yiannis Gkoufas <johngou...@gmail.com>
Subject Re: Choosing boosting in Lucene
Date Sat, 16 Apr 2011 15:54:07 GMT
On Sat, Apr 16, 2011 at 6:43 PM, HAIDUC SONIA <haiduc_sonia@yahoo.com>wrote:

> Hello,
>
> I have a few questions about boosting in Lucene. I am running a research
> project where I have, for each document, 4 fields: f1, f2, f3, f4. I also
> have a set of queries for my corpus, and I know the relevant documents for
> each of these queries. What I want to study is how boosting affects the
> search results of these queries. Basically, I want to show that by boosting
> some of these fields the results are better (I hope).
> I have, though, a few essential questions that I cannot figure out and I
> would really appreciate some help...
>
> 1. Is there any difference between boosting the fields at index time and
> boosting the terms in the queries which appear in these fields at search
> time?
>

I have noticed that for some queries the results are the same, for others
not. It should be the same, if someone knows why is it happening it would be
very useful.


> Again, I know beforehand the set of queries and also the terms in these
> queries which appear in the documents in the corpus in each of the fields.
>
> 2. In what range are boosting values usually chosen? I.e., should I choose
> boosts in a 0.5-2 range (say 0.5, 1, 1.5, 2), like I have seen in soem
> examples, or is it the same if I choose boosts in a range like 50-200
> (respectively 50, 100, 150, 200)?
>

After running numerous tests I have concluded that a boost factor up to 20
makes sense.


>
> 3. How sensitive is boosting in Lucene? For example, if I know
> approximately
> the importance of each field, and I want to assign boosting values
> accordingly, what would be good differences between the values of the
> boosting factor for the different fields? More precisely, if the importance
> order is f1<f2<f3<f4, will it matter if I choose the boosts as (1,2,3,4),
> or
> (1, 5, 10, 15)?
>

Yes, that is certain.


>
> 4. Is there any method besides trial and error for finding the boosts for
> each field that work the best for a particular corpus?
>

In order to get the best boosts for my tests, I tried brute-forcing it.
In theory, you can experiment with vector space models, modified perceptron.


>
> Thank you very much,
> Cristina
>

Best Regards,
Yiannis

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message