lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Haxby <>
Subject Re: Update performance/indexwriter.delete()?
Date Fri, 15 Apr 2005 14:06:16 GMT
Roy Klein wrote:

>Here's the scenario that I can't guarantee won't happen:
>There might be 3 transactions in a very short time span (for example, 1
>second), here's what they are:
>1) update doc1 (DEL doc1, ADD doc1)
>2) update doc2 (DEL doc2, ADD doc2)
>3) delete doc1
>If I process these in order, then at the end of the 3 transactions, my index
>will only have one document in it, doc2.
When I did something similar a while back(*) I added a little 
intelligence to the insertion of jobs into the queue.  Essentially, if 
I'm about to update a document that's already scheduled for updating, 
don't update it twice (delete the first job, ignore the new job, which 
ever is appropriate).  Similarly, if I'm going to delete a document 
that's awaiting update, don't bother with the update (but let the delete 
in so that the old document is deleted.  You can handle new documents 
and updates to them in a similar vein.

For your particular example, the first job would be discarded when the 
third job is added to the queue.


(*) Updates to a folder in an IMAP server if you're interested.   
Although I'm dealing with only a few bytes at a time, the time-based 
batch is much the same, with the time being dictated by the IMAP client 
rather than any sort of clock.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message