lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <>
Subject Re: Different score for the same documents
Date Mon, 02 Nov 2009 12:59:19 GMT
What were their scores? I'm assuming that by "rank" you mean
the order in which the documents were returned, not the raw Lucene

Lucene uses the insertion order to break ties. That is, two documents
with the same score will the appear in the order of their (internal)
Lucene doc ID.

So is it possible that *all* of the documents that appear between these
two have the exact same score for that query? That seems a bit
unlikely, but it's worth checking before going much further.....


On Mon, Nov 2, 2009 at 7:45 AM, kenji tsuruoka <>wrote:

> Dear. Lucene users.
> Hi.
> I have tried to index and search MEDLINE abstracts by LUCENE.
> And there were some problems in the search results.
> That is Lucene has assigned different ranks for the exactly same documents.
> I didn't know the input documents for the index contain duplicate documents
> at the first time.
> I have solve the problem by making all input documents UNIQUE for the
> index.
> But I want to know how and why the situation was happened.
> The duplicate document is as follows:
> _pubmed_id=13029105:1952Nov15
> _ArticleTitle_
> <s n="1">Experimental diabetes and clinical diabetes.</s>
> _pubmed_id_end_
> There are TWO exactly same documents in "index".
> And their rankings by Lucene are 3 and 18.
> I have known texts in XML/HTML data should be extracted before indexing.
> Anyway, I haven't done this work now.
> Please let me know the reason why the same documents were shown different
> ranks.
> Best,
> K
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message