lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Ganyo <scott.ga...@eTapestry.com>
Subject RE: Making Lucene Transactional
Date Fri, 28 Jun 2002 21:54:47 GMT
How about this?  I'll admit it punts a little, but I still think it could be
a working model:

A-tomicity - A single call to the file system would commit the transaction.
In the case of an IndexWriter, calling close() already does this with a
simple rename operation (at least on Unix).  Adding an abort() would throw
away the new files.  Not yet sure of how to achieve this once Document
deletes are thrown in the mix...

C-onsistency - Document deletes must somehow be tracked and applied in a
single operation along with any Document adds.  Again, though, I'm not sure
of how the deletes could be accomplished with the current file format.

I-solation - Just force write transactions to be serialize.  Lucene does
this with IndexWriters anyway.  We could enforce a one-to-one relationship
between transactions and IndexWriters...

D-urability - Lucene would attempt to do its best.  Once it is written to
the disk, however, it is outside of Lucene's domain.  Wouldn't a journaled
filesystem take care of this?

Scott

> -----Original Message-----
> From: Brian Goetz [mailto:brian@quiotix.com]
> Sent: Friday, June 28, 2002 1:58 PM
> To: Lucene Developers List
> Subject: Re: Making Lucene Transactional
> 
> 
> > I think that much of the goal can be accomplished with a 
> much smaller effort
> > than you are suggesting by making a couple of simplifying 
> assumptions:
> > 
> > 1) Assume the filesystem is stable.  There are ways to 
> accomplish that
> > outside of Lucene anyway.
> > 
> > 2) Assume write transactions will be serialized.  The 
> removes any need for
> > complex write locking strategies.
> 
> But these assumptions are not valid.  
> 
> Now, if you want to talk about introducing a concept of 
> "batched updates"
> into Lucene, where a batch is applied atomically, that could 
> be a useful
> improvement.  But to pretend we offer transactional semantics when we
> don't just seems silly.  
> 
> > So, yes, there's some work that would have to be done, but 
> I'm not at all
> > convinced that it would be prohibitively challenging.  Did 
> I miss anything?
> 
> It could just be terminology, but I dislike describing something as if
> it has transactional semantics when it doesn't.  And given that the
> file system is simply not transactional, anything built on top of it
> will not be, either.  
> 
> There's nothing wrong with trying to make Lucene _more_ stable, _less_
> likely to get corrupted if something bad happens, etc.  But this is
> not making it transactional.  ANd talking about two-phase 
> commit implies
> that it works with an outside transaction monitor.
> 
> We're already doing much, much better than most search engines because
> all additions are done by creating new segments, so as long as rename
> is atomic, users will not see an inconsistent state.  However, in the
> case of disk failure, we're going to be subject to the same risks as
> any other file-based application unless we implement a transaction
> log.
> 
> I do like the idea of grouping updates and making them visible
> atomically as a group, if that's not a lot of additional work.
> 
> 
> 
> 
> --
> To unsubscribe, e-mail:   
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: 
> <mailto:lucene-dev-help@jakarta.apache.org>
> 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message