lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Is it possible to normalise BM25 scores in the query level?
Date Tue, 27 Jun 2017 17:19:05 GMT
Hi,
> In our use case, we want to perform learning to rank and train a decision
> tree using BM25 scores as one of our features. Decision trees requires
> normalised features to be able to properly split the data. Since BM25 scores
> for different queries varies considerably, decision tree cannot find a
> suitable threshold to split.

The "old Lucene" query normalization has nothing to do with BM25. This normalization is done
based on the query only, just to ensure that numbers are around 1 (which has reasons on early
days of lucene where huge scores lead to rounding problems). This was removed in Lucene 7
together with TF-IDF based "coordination factors" in boolean queries. In fact this is an improvement,
because the normalization scaled the values by some factor depending on query, making them
impossible to compare.
 
> What was the normalisation in Lucene 6? We are using Lucene 6.4.2 but
> could
> not find any way to normalise BM25 scores other than hacking into the code.

In Lucene 7 the scores are no longer normalized and are way better to compare between queries
of similar structure and different indexes, but still with no guarantees (of course comparing
a query with different number of words or completely different structure is still not easily
possible). Plain word-based queries ("match query in Elasticsearch) should be fine if you
somehow add your own normalization on the number of terms in the query (e.g, divide score
final score by number of terms). For LTR purposes this should be fine. I'd try the Lucene
7 master version to validate if this helps for your use case.

Uwe

> --
> View this message in context: http://lucene.472066.n3.nabble.com/Is-it-
> possible-to-normalise-BM25-scores-in-the-query-level-
> tp4342991p4343048.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message