lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Hit score and confidence ratio in results
Date Thu, 02 Dec 2010 01:27:43 GMT
If I understand your problem space, you're trying to compare scores
across two different queries. Don't do this, it is meaningless. Scores
are only valid for comparing documents' relevance within a single
query. How would one compare scores between two queries like
? Different fields, different values. What does it mean that the score
for a doc matching the first query has a higher or lower score than
the first doc matching the second query?

As for how scores are > 1, have you looked at the scoring
formula here?
and here? the
Scoring Formula

Don't forget that there can also be boosts...


On Wed, Dec 1, 2010 at 7:56 AM, Hassan Saneifar <>wrote:

> Hello,
> I'm using lucene to retrieve relevant segments of a corpus based on a
> given query. Every segment is represented as Document in the indexing.
> Once the relevant segments are retrieved, I search for a Regex in them
> to capture the requested information. It can happens that I found the
> regex in two or more documents retrieved by lucene. Thus, I would like
> to show a confidence score to each captured Regex. To make simple, I
> means if the regex is found in the first retrieved document and in
> fourth, the one retrieved in the first document is more certain to be
> relevant since the its document was better scored than other one.
> To show a confidence score, I tried to use scores returned by Lucene.
> But, the lucene score are arbitrary and not normalized. So it's not
> relevant to use. I wonder how we can have an arbitrary score (>1) when
> the default scoring system in lucene is based on cosine measure. In a
> simple case, the cosine score between two document vectors (obtained by
> tf-idf) is between 0 and 1.
> Since the lucene score is arbitrary and not normalized, if I want to
> verify which query is more relevant to retrieve a document, how can I
> compare the score of the first retrieved documents corresponding to each
> query ? Is there any way to give a confidence score to each retrieved
> document which make the comparison of the results of the different
> search using the different queries possible?
> Many thank
> Best regards.
> --
> Hassan Saneifar, PhD. student,
> University Montpellier II
> LIRMM laboratory
> 161 rue Ada 34392 Montpellier, France
> Tel: +33 (0)467 41 85 41
> Fax: +33 (0)467 41 85 00
> URL:
> --
> Hassan Saneifar, R&D engineer
> Satin IP Technologies
> Tel: +33 (0)467 13 00 86
> URL:
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message