lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: problem in Lucene's ranking function
Date Wed, 05 May 2010 19:10:03 GMT
2010/5/5 José Ramón Pérez Agüera <>

> Hi Robert,
> the problem is not the linear combination of fields, the problem is to
> apply the boost factor per field after the term frequency saturation
> function and then make the linear combination of fields. Every system
> that implement BM25F, including terrier, take care of that, because if
> you don't do it you have a bug in your ranking function and not just a
> different ranking function.

José, well then this should not be much of a problem to handle in
LUCENE-2392, because as I mentioned, if you have a tf() or idf() its really
because you decided to do this yourself. So you could easily apply the boost
inside your log or sqrt or whatever, if you want.

But what I propose we do, is make sure the relevance functions we provide
(especially any default for 4.0) take care of this for your structured case,
while still providing the capability for someone to get the old behavior
[see below]

> If you implement this little
> change, Lucene ranking fucntion will work properly with structured
> documents and all your other concerns about allowing users to
> implement different ranking functions for different situations will be
> not affected by this change.
Well, I'm not sure all my concerns go away! I think its best to implement a
change like this in the flexible scoring framework (LUCENE-2392), so that
users, if they want, can get the old behavior: "the bug" as you call it.

The reason I say this due to the unique cases of lucene, some people are
doing scoring in very crazy ways and if they aren't able to get the old
behavior with regards to boosting, they might be upset... even if it is
really giving them worse relevance...

Robert Muir

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message