lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chuck Williams <>
Subject Re: adding "explicit commits" to Lucene?
Date Tue, 16 Jan 2007 22:40:52 GMT
Michael McCandless wrote on 01/16/2007 12:09 PM:
> Doug Cutting wrote:
>> Michael McCandless wrote:
>>> We could indeed simply tie "close" to mean "commit now", and not add a
>>> separate "commit" method.
>>> But what about the "bulk delete then bulk add" case?  Ideally if a
>>> reader refreshes by checking "isCurrent()" it shouldn't ever open the
>>> index "at a bad time".  Ie, we need a way to open a reader, delete a
>>> bunch of docs, close it *without* committing, open a writer, add a
>>> bunch of docs, and then do the commit, all so that any readers that
>>> are refreshing would know not to open the segments_N that was
>>> committed with all the deletes but none of the adds.  This is one use
>>> case that explicit commits would address.

I've found batched deleteAdd update to be bit more complex than this in
two respects.  First, the index is vulnerable after the deletes and
before the added revisions as an error could cause loss of information. 
My current application must journal everything deleted to account for
this.  The proposed commits would alleviate that need since a failed
deleteAdd batch could be aborted.  Second, updates may need to hold the
revisions to the documents in memory for performance, currency of
simultaneous access, or other reasons.  Memory limits may restrict how
many of these revised documents can be held.  This leads to a
limited-memory-driven requirement to break the deleteAdd batch into
multiple subbatches.  So, it should be possible to implement a set of
deleteAdd batches as a single transaction, not just one batch.  The
original proposal meets this requirement.

>> One could also implement this with a Directory that permits
>> checkpointing and rollback.  Would that be any simpler?
> True (I think?).  Maybe we could push the "transactional" behavior
> down lower (into Directory) in Lucene.  Though all implementations of
> Directory would need to implement their own transactional behavior vs
> one implementation at the reader/writer layer?
> As long as checkpointing is decoupled from the opening/closing of
> readers and writers then I think this would support this use case.
> So basically, the Directory layer would "mimic" the inode model (and
> hard links) that unix filesystems provide?  Or maybe the Directory
> would not make any changes visible to a reader until a writer did a
> "checkpoint"?  But then how would this work across machines (on a
> shared filesystem)?  I'm not sure I see how we could effectively (or
> more simply) push this down into Directory instead of at the
> reader/writer layer.

This seems more complex and less flexible for no benefit.  It's
analogous to a database pushing its transaction model into is file
storage component.  Transactions are a first class concept with
semantics at the index level.  The original proposal at the index level
seems to me to be easy to implement, easy to understand and easy to use.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message