lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Possible to remove duplicate documents in sort API?
Date Tue, 07 Sep 2004 16:49:33 GMT
Kevin A. Burton wrote:
> My problem is that I have two machines... one for searching, one for 
> indexing.
> 
> The searcher has an existing index.
> 
> The indexer found an UPDATED document and then adds it to a new index 
> and pushes that new index over to the searcher.
> 
> The searcher then reloads and when someone performs a search BOTH 
> documents could show up (including the stale document).
> 
> I can't do a delete() on the searcher because the indexer doesn't have 
> the entire index as the searcher.

I can think of a couple ways to fix this.

If the indexer box kept copies of the indexes that it has already sent 
to the searcher, then it can mark updated documents as deleted in these 
old indexes.  Then you can, with the new index, also distribute new .del 
files for the old indexes.

Alternately, you could, on the searcher box, before you open the new 
index, open an IndexReader on all of the existing indexes and mark all 
new documents as deleted in the old indexes.  This shouldn't take more 
than a few seconds.

IndexReader.delete() just sets a bit in a bit vector that is written to 
file by IndexReader.close().  So it's quite fast.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message