lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Scoring over Multiple Indexes
Date Thu, 22 Oct 2015 15:43:41 GMT
In a word, no. At least not that I've heard of. "normalizing scores"
is one of those things
that sounds reasonable on the surface, but is really meaningless.
Scores don't really
_tell_ you anything about the abstract "goodness" of a doc, they just
tell you that
doc1 is likely better than doc2 _within a single query_. You can't even compare
scores in the _same_ index across two different queries.

At its lowest level, say one index has 1,000,000 occurrences of
"erick", while index 2 has
exactly 1. Term frequency is one of the numbers that is used to
calculate the score.
How does one normalize the part of the calculation resulting from
matching "erick"
between the two indexes? Anything you do is wrong.

Similarly, expecting documents to be returned in a particular order
because of boosting
is not going to be satisfactory. Boosting will influence the final
score and thus the
position of the document, but not absolutely order them unless you put
in insane boosts.
Tests based on boosting and doc ordering will be very fragile I'd guess.

Best,
Erick

On Thu, Oct 22, 2015 at 8:34 AM, Bauer, Herbert S. (Scott)
<Bauer.Scott@mayo.edu> wrote:
> We have a test case that boosts a set of terms.  Something along the lines of “term1^2
AND term2^3 AND term3^4 and this query runs over a two content distinct indexes.  Our expectation
is that the terms would be returned to us as term3, term2 and term1.  Instead we get something
along the lines of term3, term1 and term2.  I realize from a number of postings that this
is the result of the scoring methods action taking place within an individual index rather
than against several indexes.  At the same time I don’t see a lot of solutions offered.
Is there an out of the box solution to normalize scoring over diverse indexes?  If not is
there a strategy for rolling your own normalizing solution?  I’m assuming this has to be
a common problem.    -scott
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message