lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <DCutt...@grandcentral.com>
Subject RE: Delete is not multi-thread safe
Date Thu, 31 Jan 2002 18:08:51 GMT
> From: Dmitry Serebrennikov [mailto:dmitrys@earthlink.net]
>
> >It seems that either a) deletes should be write-through, or 
> >b) deletes should 
> >be done by the writer, or c) writer should not optimize 
> >non-RAM segments unless 
> >asked to. As a client, I like option b) the best, though, 
> >this is not the easiest option to implement. My $0.02
>
> Or maybe
> d) when merging, a writer should share an in-memory image of segment1 
> and prohibit any deletes on segment one while merge is in progress?

Or maybe:
e) Deleting from a reader while an IndexWriter is open on the same index
should throw an exception.  This just requires the delete code to obtain the
write.lock.

Deletions and additions must happen serially.  In particular, the intended
order of operations is:
  reader.open();
  reader.deleteDocument(...);
  reader.close();
  writer.open();
  writer.addDocument(...);
  writer.close();

The bug is that this is not enforced, nor is it well documented.  Let's fix
that first.  Another bug might be that IndexWriter is a misnomer: it should
really be called something like DocumentAdder.

> Personally, I would also like to see deletion moved into the writer. 

And I'd like to see cars outlawed.

Yes, this would be a cleaner API, but it would also encourage folks to write
less efficient index updating code.  The most efficient approach is to batch
deletions and additions separately.  Intermingling them will never be as
fast.  The current API encourages one to do things this way.  Also,
currently the deletion code is very simple and easy to maintain.  Optimizing
intermingled additions and deletions would require adding a lot of new code,
substantially complicating Lucene, and likely introducing bugs.

Some background:  To delete a document we need an IndexReader to find its
document number.  To add a document we just need to add a new segment,
opening no readers.  Periodically a subset of the segments are opened by a
reader to merge them.

If deletion were added to an IndexWriter it would need to have an
IndexReader opened on all segments, in order to find the document number and
mark it as deleted.  Each time a document is added or segments are merged
this reader must be invalidated.  It would be very inefficient to re-open
this IndexReader each time a document is deleted, so code would need to be
added to incrementally update a SegmentsReader in light of document
additions and merges.  Such a reader could also be optimized to only open
those files that are required for deletion.  Still, intermingling inserts
and deletes would be less efficient, since it would require the dictionaries
for each altered segment to be re-read in order to find the document number.

So it could be done.  But should it be?

Doug

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message