lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Veentjer - Anchor Men" <p.veent...@anchormen.nl>
Subject RE: Update performance/indexwriter.delete()?
Date Fri, 15 Apr 2005 14:13:48 GMT
I have done something similar. I have added a creational date to my
Jobs. Some jobs can take some time (analyzing) and all indexwrite jobs
are queued for a threadpool to be processed. After they are analyzed,
they are added in a queue to be written to the index. If 2 (or more)
indexwrite jobs for the same document are found in a single batch, the
oldest one is cancelled. 


-----Oorspronkelijk bericht-----
Van: John Haxby [mailto:jch@scalix.com] 
Verzonden: vrijdag 15 april 2005 16:06
Aan: java-user@lucene.apache.org
Onderwerp: Re: Update performance/indexwriter.delete()?

Roy Klein wrote:

>Here's the scenario that I can't guarantee won't happen:
>
>There might be 3 transactions in a very short time span (for example, 1

>second), here's what they are:
>
>1) update doc1 (DEL doc1, ADD doc1)
>2) update doc2 (DEL doc2, ADD doc2)
>3) delete doc1
>
>If I process these in order, then at the end of the 3 transactions, my 
>index will only have one document in it, doc2.
>  
>
When I did something similar a while back(*) I added a little
intelligence to the insertion of jobs into the queue.  Essentially, if
I'm about to update a document that's already scheduled for updating,
don't update it twice (delete the first job, ignore the new job, which
ever is appropriate).  Similarly, if I'm going to delete a document
that's awaiting update, don't bother with the update (but let the delete
in so that the old document is deleted.  You can handle new documents
and updates to them in a similar vein.

For your particular example, the first job would be discarded when the
third job is added to the queue.

jch


(*) Updates to a folder in an IMAP server if you're interested.   
Although I'm dealing with only a few bytes at a time, the time-based
batch is much the same, with the time being dictated by the IMAP client
rather than any sort of clock.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message