lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <>
Subject Re: Possible to remove duplicate documents in sort API?
Date Mon, 06 Sep 2004 18:04:08 GMT

On Sunday 05 September 2004 23:13, Kevin A. Burton wrote:
> Paul Elschot wrote:
> >Kevin,
> >
> >On Sunday 05 September 2004 10:16, Kevin A. Burton wrote:
> >>I want to sort a result set but perform a group by as well... IE remove
> >>duplicate items.
> >
> >Could you be more precise?
> My problem is that I have two machines... one for searching, one for
> indexing.
> The searcher has an existing index.
> The indexer found an UPDATED document and then adds it to a new index
> and pushes that new index over to the searcher.
> The searcher then reloads and when someone performs a search BOTH
> documents could show up (including the stale document).

What do you mean by reload?
You could try and delete the new documents from the old index beforehand.

I don't know to what extent you can do such updates atomically with
Lucene. One way might be to stop all searches, delete the
new docs from the old index, close the old index, then open a
MultiReader on the old and the new index together, and use this
to do further searches.
This writes the deleted bits for the old index, ie. one byte per 8 docs
and closes and reopens the index files, which takes an amount of  time
that is linear in the old index size. 
But there may be better ways.

Paul Elschot

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message