lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Victor Hadianto <vict...@nuix.com.au>
Subject Merging indexes and removing duplicates.
Date Fri, 09 May 2003 00:52:40 GMT
Hi all,

In our application we are indexing a massive amount of documents in parallel 
over 16 machines. Our problem is that we have duplicate documents in the 
input and the same document is indexed multiple times over the boxes.

During the merging to create one mega index I ended up with the same document 
duplicated in the index, thus when doing a search on a document it returns 
more than one result.

Is there a way during the merging to tell Lucene not to index a document 
(given a Lucene field) if it's already exist in the other index? Or do I have 
to remove the duplicate index myself?

regards,

victor




---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message