lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Srinivas.N." <>
Subject Re: Question on custom scoring
Date Wed, 15 Aug 2007 01:07:45 GMT

Could be normalized relative to the max score among the matching documents -
but I realize that this can only be done AFTER collecting the documents (as
the Hits class does currently). It could also be normalized to some
"absolute relevance score" that is comparable across queries, but there is
no notion of absolute score in Lucene, is there?

Let me explain the problem I am trying to solve:

I need to weigh in multiple factors when returing matching documents for a
query. Lets say there are 2 factors - one is "relevance" as determined by
the IR score (tf, idf, norms etc.) from lucene and another is say
"popularity". For example, lets say 2 documents match a query "paris
hilton". One document maybe much more popular than another but very close in
"relevance". I'd like to find some way of including the popularity of the
document in the scoring formula, and I thought the CustomScoreQuery would be
useful, but I am realizing that it may not be easy because the "relevance"
score from Lucene has no absolute meaning. The relevance score could be 5 or
500 and there is no way for me gauge what that number means and how much I
should weigh he "popularity" value relative to it when computing the custom

So, the best way I can think of doing this right now is as follows:

If I need to return n docs as the result, use the Hits API to get a much
larger number say 2n or 3n (or till the normalized score relative to the
maximum score falls below a certain threshold), and then sort the results by
own algorithm which combines the normalized score returned by the Hits API
and the popularity.



hossman wrote:
> : [b] The subQueryScore that is returned in the customScore() API doesnt
> seem
> : to be normalized. This makes it difficult for me to weight the different
> : scores properly. Is there an easy way to get the normalized "text index"
> : score in customScore() API so I can easaily weigh in the other factors
> : relative to it?
> normalized relative to what?
> -Hoss
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message