lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Ganyo <>
Subject RE: Making Lucene Transactional
Date Fri, 28 Jun 2002 17:46:19 GMT
I think that much of the goal can be accomplished with a much smaller effort
than you are suggesting by making a couple of simplifying assumptions:

1) Assume the filesystem is stable.  There are ways to accomplish that
outside of Lucene anyway.

2) Assume write transactions will be serialized.  The removes any need for
complex write locking strategies.

3) Assume that any transaction monitoring would be outside of Lucene.

What that leaves us with, then, is merely adding to Lucene the ability to
execute the following: begin(), commit(), abort()... and possibly prepare().

I don't see much of a problem implementing these semantics for adding
documents as the IndexWriter as it pretty much follows a transactional
pattern with a very low probability of failure.  Therefore, the commit
semantics are merely a rename/no rename decision on the new segments file.

Deletions, on the other hand, seem more problematic.  First of all there is
the asymetry of having the delete on the IndexReader.  In fact, the need for
serialized control of write/delete access caused me to write my
application's interface to Lucene to go through only two access points
(IndexSearcher and IndexWriter) and force access to the delete() method
through the IndexWriter.  Even doing that, though, I don't think the
document deletion process currently has the capability to batch up its
changes and commit them.  This would need to be added.

Finally, the additions and deletions would need to be coordinated to allow
both types of changes under a transaction.

So, yes, there's some work that would have to be done, but I'm not at all
convinced that it would be prohibitively challenging.  Did I miss anything?


> -----Original Message-----
> From: Brian Goetz []
> Sent: Friday, June 28, 2002 9:45 AM
> To: Lucene Developers List
> Subject: Re: Making Lucene Transactional
> > That's interesting.  So it would be a very small change to 
> add transactional
> > (and even 2-phase commit) capabilities to the writer?  What 
> about deletes?
> > Since they use the reader, would it still be possible to 
> allow a 2-phase
> > commit/abort on that?
> I think you're not using "transactional" in the same sense as Doug is.
> Very few file systems are transactional, although some offer a small
> number of atomic operations, such as rename.  This doesn't make them
> transactional, but it allows application writers (that's us) to write
> apps that are _less likely_ to be victimized by system failure.  But
> Lucene still writes blocks to disk via the file system, without a
> transaction log, and since disk drivers do things like defer or
> reorder disk writes, we could still lose if the system crashed at the
> wrong time.  Still, we do a lot to reduce this risk beyond that of
> most file-based applications.
> > I would very much like to have a 2-phase commit in Lucene 
> in order to ensure
> > that it is always in sync with my database.  I always 
> thought that I'd end
> > up having to write custom code to store the Lucene index in 
> the database,
> > but maybe that wouldn't be necessary...?
> Two phase commit is a whole different beast; this involves
> coordinating multiple transactional resource managers (which Lucene
> isn't) with a separate transaction monitor, using a protocol such as
> XA or OTS.  We're nowhere near that.  
> Storing the index in a database would be a good start, although the
> Directory interface is really derived with the assumptions of a file
> system.  Still, that would not get us all the way there -- you'd need
> to introduce transaction demarcation methods into the Lucene API, so
> that these could be passed to the DBDirectory, so we would know what
> groups of updates should be considered atomic.  
> And that still doesn't get us close to 2PC; we'd still have to support
> XA for that, and I don't see any good reason to undertake that level
> of effort at this point.  
> However, I think revisiting Directory with an eye towards making it
> something that can be efficiently implemented on either a DB or a file
> system would be worthwhile.  
> > > -----Original Message-----
> > > From: Doug Cutting []
> > > Sent: Thursday, June 27, 2002 10:36 AM
> > > To: Lucene Users List
> > > Subject: Re: Stress Testing Lucene
> > > 
> > > 
> > > It's very hard to leave an index in a bad state.  Updating the 
> > > "segments" file atomically updates the index.  So the only way to 
> > > corrupt things is to only partly update the segments file.  
> > > But that too 
> > > is hard, since it's first written to a temporary file, 
> which is then 
> > > renamed "segments".  The only vulnerability I know if is that 
> > > in Java on 
> > > Win32 you can't atomically rename a file to something 
> that already 
> > > exists, so Lucene has to first remove the old version.  So if 
> > > you were 
> > > to crash between the time that the old version of "segments" 
> > > is removed 
> > > and the new version is moved into place, then the index would be 
> > > corrupt, because it would have no "segments" file.
> > > 
> > > Doug
> --
> To unsubscribe, e-mail:   
> <>
> For additional commands, e-mail: 
> <>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message