lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Transactional Directories
Date Mon, 14 Feb 2005 21:55:34 GMT


Oscar Picasso wrote:
> Hi,
> 
> I am currently implementing a Directory backed by a Berkeley DB that I am
> willing to release as an open source project.
> 
> Besides the internal implementation, it differs from the one in the sandbox in
> that it is implemented with the Berkeley DB Java Edition.
> 
> Using the Java Edition allows an easier distribution as you just need to add a
> single jar in your classpath and you have a fully functional Berkeley DB
> embedded in your application without the hassle of installing the C Berkeley
> DB.
> 
> While initially implemented with the Java Edition this Directory can easily be
> ported to a Berkeley DB C edition or a Berkeley DB XML (for example to use
> Berkeley DB XML + Lucene as the base for a document management system).
> 
> This implementation works fine and I am quite happy with its speed.
> 
> There is still an important problem I face and it has to do with how to deal
> with some transactions. After all, the purpose of a Berkeley implementation, or
> a JDBC one for that matter, is its ability to use transactions.
> 
> After looking at the Andy Varga code, it seems that the implementation in the
> sandbox face the same problem (correct me if I am wrong). I have also learn
> that the JDBC directory was not implemented with transactions in mind.
> 
> Here the problem. 
> 
> If I do something like that:
> -- case A --
> <pseudo-code>
> +begin transaction
>  new IndexWriter
>  create/update/delete objects in the database
>  index.addDocument (related to the objects)
>  indexWriter.close()
> +commit
> </pseudo-code>
> 
> Everything is fine. The operations are transactionally protected. You can even
> do many writes/updates. As far as everything in enclosed by the pairs
> begin-transaction/new-index-writer ... index-writer.close/commit everything is
> properly undone is case of any operation fails insidde the transaction.
> 
> For batch insertions the whole batch is rolled back but at least your object
> database is consistent with the index.
> 
> If you do mostly batch insertions and relatively few random individual
> insertions. That's fine.
> 
> However with a relatively high number of random insertions, the cost of the
> "new IndexWriter / index.close()" performed for each insertion is two high.
> Unfortunately this it is a common case for some kind of applications and it is
> where a transactional directory would the most useful.
> 
> In such a case you would like to do something like that:
> -- case B --
> <pseudo-code>
> new IndexWriter
>  ...
> +begin transaction-1
>  create/update/delete objects in the database
>  index.addDocument (related to the objects)
> + commit
> ...
> +begin transaction-2
>  create/update/delete objects in the database
>  index.addDocument (related to the objects)
> + commit
> ...
> indexWriter.close()
> </pseudo-code>
> 
> The benefits would be to protect individual insertions while avoiding the cost
> of using each time a new IndexWriter.
> 
> It doesn't work however. Here is my understanding. 
> 
> Suppose that in case B, transaction-1 fails and transaction-2 succeeds.
> 
> In that case the underlying database system rolls back all the writes done
> during transaction-1 whether they were related to the objects stored in the
> database or to the index (the writes done to the IndexOutput are also undone).
> From the database point of view consistency is maintained between the stored
> object and the index.
> 
> The problem is that after transaction-1 Lucene still 'remembers' the segment(s)
> it wrote during transaction-1. Later, Lucene might 'want' to perform some
> operation based on these references (on merging the segments, I think) while
> the underlying segment(s) files do not exist anymore. This is where an
> Exception is thrown.
> 
> The solution would be to instruct Lucene to 'forget' or undo any reference to
> the segments created during transaction-1 in case of rollback;
> 
> I have noticed that references to the segments are stored in a segmentInfos
> map. I was thinking about removing the segmentsInfo map entries created during
> transaction-1 in case of a rollback but I don't see if it's enough and/or
> potentially dangerous.
> 
> I would really appreciate any comment about this idea and also about my
> understanding of the Lucene indexing process.
> 
> If I/we could find a solution it would also benefit a JDBC Directory
> implementation
> 
> Thanks.
> 
> Oscar
> 
> P.S.: If and when my implementation is fully functional, is there a place in
> the Lucene project where I could release it? (Maybe the sandbox).
> 
> 
> 
> 		
> __________________________________ 
> Do you Yahoo!? 
> The all-new My Yahoo! - What will yours do?
> http://my.yahoo.com 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message