Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 77739 invoked from network); 14 Feb 2005 18:02:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 14 Feb 2005 18:02:29 -0000 Received: (qmail 76090 invoked by uid 500); 14 Feb 2005 18:02:28 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 76066 invoked by uid 500); 14 Feb 2005 18:02:28 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 76052 invoked by uid 99); 14 Feb 2005 18:02:27 -0000 X-ASF-Spam-Status: No, hits=0.4 required=10.0 tests=DNS_FROM_RFC_ABUSE X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from web50203.mail.yahoo.com (HELO web50203.mail.yahoo.com) (206.190.38.44) by apache.org (qpsmtpd/0.28) with SMTP; Mon, 14 Feb 2005 10:02:26 -0800 Received: (qmail 40090 invoked by uid 60001); 14 Feb 2005 18:02:23 -0000 Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; b=Z3iZ/+9K+C5g53kIwZWrDXUYCTrhKqwaVOinihaFtTlklr6WzeMn6+NtNkgJnJLz0fuV9I/bP2IXyNH75SVe1uH+PTlOC7wp9GXJ1aGB4aIrYQwwNS5PEs52sN2fKa5PbYZpvv0R28YJKfkkHszW4ez6uQfIbk9ht1EI0u3WLBU= ; Message-ID: <20050214180223.40088.qmail@web50203.mail.yahoo.com> Received: from [24.37.192.190] by web50203.mail.yahoo.com via HTTP; Mon, 14 Feb 2005 10:02:23 PST Date: Mon, 14 Feb 2005 10:02:23 -0800 (PST) From: Oscar Picasso Subject: Transactional Directories To: lucene-dev MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi, I am currently implementing a Directory backed by a Berkeley DB that I am willing to release as an open source project. Besides the internal implementation, it differs from the one in the sandbox in that it is implemented with the Berkeley DB Java Edition. Using the Java Edition allows an easier distribution as you just need to add a single jar in your classpath and you have a fully functional Berkeley DB embedded in your application without the hassle of installing the C Berkeley DB. While initially implemented with the Java Edition this Directory can easily be ported to a Berkeley DB C edition or a Berkeley DB XML (for example to use Berkeley DB XML + Lucene as the base for a document management system). This implementation works fine and I am quite happy with its speed. There is still an important problem I face and it has to do with how to deal with some transactions. After all, the purpose of a Berkeley implementation, or a JDBC one for that matter, is its ability to use transactions. After looking at the Andy Varga code, it seems that the implementation in the sandbox face the same problem (correct me if I am wrong). I have also learn that the JDBC directory was not implemented with transactions in mind. Here the problem. If I do something like that: -- case A -- +begin transaction new IndexWriter create/update/delete objects in the database index.addDocument (related to the objects) indexWriter.close() +commit Everything is fine. The operations are transactionally protected. You can even do many writes/updates. As far as everything in enclosed by the pairs begin-transaction/new-index-writer ... index-writer.close/commit everything is properly undone is case of any operation fails insidde the transaction. For batch insertions the whole batch is rolled back but at least your object database is consistent with the index. If you do mostly batch insertions and relatively few random individual insertions. That's fine. However with a relatively high number of random insertions, the cost of the "new IndexWriter / index.close()" performed for each insertion is two high. Unfortunately this it is a common case for some kind of applications and it is where a transactional directory would the most useful. In such a case you would like to do something like that: -- case B -- new IndexWriter ... +begin transaction-1 create/update/delete objects in the database index.addDocument (related to the objects) + commit ... +begin transaction-2 create/update/delete objects in the database index.addDocument (related to the objects) + commit ... indexWriter.close() The benefits would be to protect individual insertions while avoiding the cost of using each time a new IndexWriter. It doesn't work however. Here is my understanding. Suppose that in case B, transaction-1 fails and transaction-2 succeeds. In that case the underlying database system rolls back all the writes done during transaction-1 whether they were related to the objects stored in the database or to the index (the writes done to the IndexOutput are also undone). >From the database point of view consistency is maintained between the stored object and the index. The problem is that after transaction-1 Lucene still 'remembers' the segment(s) it wrote during transaction-1. Later, Lucene might 'want' to perform some operation based on these references (on merging the segments, I think) while the underlying segment(s) files do not exist anymore. This is where an Exception is thrown. The solution would be to instruct Lucene to 'forget' or undo any reference to the segments created during transaction-1 in case of rollback; I have noticed that references to the segments are stored in a segmentInfos map. I was thinking about removing the segmentsInfo map entries created during transaction-1 in case of a rollback but I don't see if it's enough and/or potentially dangerous. I would really appreciate any comment about this idea and also about my understanding of the Lucene indexing process. If I/we could find a solution it would also benefit a JDBC Directory implementation Thanks. Oscar P.S.: If and when my implementation is fully functional, is there a place in the Lucene project where I could release it? (Maybe the sandbox). __________________________________ Do you Yahoo!? The all-new My Yahoo! - What will yours do? http://my.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org