lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kenji tsuruoka <>
Subject Different score for the same documents
Date Mon, 02 Nov 2009 12:45:53 GMT
Dear. Lucene users.

I have tried to index and search MEDLINE abstracts by LUCENE.

And there were some problems in the search results.
That is Lucene has assigned different ranks for the exactly same  

I didn't know the input documents for the index contain duplicate  
documents at the first time.
I have solve the problem by making all input documents UNIQUE for the  

But I want to know how and why the situation was happened.

The duplicate document is as follows:

<s n="1">Experimental diabetes and clinical diabetes.</s>

There are TWO exactly same documents in "index".
And their rankings by Lucene are 3 and 18.

I have known texts in XML/HTML data should be extracted before indexing.
Anyway, I haven't done this work now.

Please let me know the reason why the same documents were shown  
different ranks.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message