lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kenji tsuruoka <kenji.tsuru...@gmail.com>
Subject Different score for the same documents
Date Mon, 02 Nov 2009 12:45:53 GMT
Dear. Lucene users.

Hi.
I have tried to index and search MEDLINE abstracts by LUCENE.

And there were some problems in the search results.
That is Lucene has assigned different ranks for the exactly same  
documents.

I didn't know the input documents for the index contain duplicate  
documents at the first time.
I have solve the problem by making all input documents UNIQUE for the  
index.

But I want to know how and why the situation was happened.

The duplicate document is as follows:

_pubmed_id=13029105:1952Nov15
_ArticleTitle_
<s n="1">Experimental diabetes and clinical diabetes.</s>
_pubmed_id_end_

There are TWO exactly same documents in "index".
And their rankings by Lucene are 3 and 18.

I have known texts in XML/HTML data should be extracted before indexing.
Anyway, I haven't done this work now.

Please let me know the reason why the same documents were shown  
different ranks.

Best,
K

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message