Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 97009 invoked from network); 18 Nov 2005 13:46:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 18 Nov 2005 13:46:51 -0000 Received: (qmail 33778 invoked by uid 500); 18 Nov 2005 13:46:45 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 33742 invoked by uid 500); 18 Nov 2005 13:46:44 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 33731 invoked by uid 99); 18 Nov 2005 13:46:44 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Nov 2005 05:46:44 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [80.76.63.210] (HELO mail.exis.com.gr) (80.76.63.210) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 18 Nov 2005 05:48:18 -0800 Received: from [192.168.2.32] (helo=clyde.thoukydides.gr) by rottweiler.thoukydides.gr with esmtp (Exim 4.43) id 1Ed6ZH-0003ZY-CS for java-user@lucene.apache.org; Fri, 18 Nov 2005 15:46:19 +0200 Received: from mario.exis.com.gr ([192.168.1.4] helo=MARIO) by clyde.thoukydides.gr with esmtpa (Exim 4.43) id 1Ed6eu-0007CX-FT for java-user@lucene.apache.org; Fri, 18 Nov 2005 15:52:08 +0200 Message-ID: <023001c5ec46$70d350d0$0401a8c0@MARIO> From: "Marios Skounakis" To: References: <007c01c5ea92$9df48620$0401a8c0@MARIO> <437CF39A.8030503@tera-code.com.ar> Subject: Re: Lucene & Transactional semantics Date: Fri, 18 Nov 2005 15:46:12 +0200 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-7"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2180 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 X-Spam-Score: -0.8 (/) X-Spam-Report: Spam detection software, running on the system "rottweiler.thoukydides.gr", has identified this incoming email as possible spam. The original message has been attached to this so you can view it (if it isn't spam) or label similar future email. If you have any questions, see the administrator of that system for details. Content preview: Beto, Here is an idea I have been working on as a workaround: Suppose you want to create a new document. The steps to do that are: 1. Insert the document into a "Pending Documents" table in the database. 2. Index the document with Lucene 3. Insert the document into the "Documents" table in the database, and remove it from the "Pending Documents" table in a single transaction. [...] Content analysis details: (-0.8 points, 5.0 required) pts rule name description ---- ---------------------- -------------------------------------------------- -1.4 ALL_TRUSTED Passed through trusted hosts only via SMTP 0.7 AWL AWL: From: address is in the auto white-list X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Beto, Here is an idea I have been working on as a workaround: Suppose you want to create a new document. The steps to do that are: 1. Insert the document into a "Pending Documents" table in the database. 2. Index the document with Lucene 3. Insert the document into the "Documents" table in the database, and remove it from the "Pending Documents" table in a single transaction. Periodically delete from Lucene's index Documents found in the "Pending Documents" table. Also, when returning results, filter out Documents found in the "Pending Documents" table. Basically, the Pending Documents table stores the documents that have been indexed by Lucene but have not yet been inserted in the database. Note how if an error happens between steps 2 and 3, or in step 3, the document will be found in the Pending Documents table. So you are kind of implementing a rollback for the whole procedure by deleting whatever is found in this table. If everything goes well, you remove the Document from Pending Documents, and then you know it exists both in the database and Lucene's index. Also, if an error happens after 1, you are simply left with an entry in the Pending Documents table which you can remove when you discover that there is no corresponding document in Lucene's index. This is of course rather ad-hoc and does not generalize well to other types of queries (e.g. updates, etc). But it can be a viable workaround if you don't want the added complexity of efforts like Compass. What do you think? Marios Skounakis ----- Original Message ----- From: "Beto Siless" To: Sent: Thursday, November 17, 2005 11:18 PM Subject: Re: Lucene & Transactional semantics > > Hi, I'm with the transaction problem too: I have Documents which are > represented by a Business Object (persisted in a DB with an ORM), indexed > with Lucene and finally stored in the file system. So it's very difficult > to maintain the consistency in an error scenario. > The main problem is that if you implement some ad-hoc transaction with > Lucene (working in a RAMDirectory or keeping the commands to apply until > the end), you still have to coordinate the lucene transaction with the > others. Cause if lucene transaction rollbacks you can abort the db > transaction, but if lucene transaction commits you can't do anything if > the DB transaction fails with out a 3pc transaction manager. > Does Anybody have an idea about how to reduce the error time window? Could > this problem be solved storing the index in a database? > Thanks > Beto > > > Marios Skounakis wrote: >> Hi all, >> >> I am interested in developing a system which will use Lucene to implement >> the search functionality. A key characteristic of this system is that >> certain information about the indexed documents will be editable by the >> user administrators. For instance, the user administrators can manually >> create "document collections" and assign some of the indexed documents to >> them. One way to implement document collections would by having documents >> have a dedicated field for storing the document collection id, and >> storing the document collection information in a database. >> >> Ideally, such an operation as the above should have transactional >> semantics, i.e. if a user wants to assign documents x, y and z to >> collection C, then either all three documents should be assigned to the >> collection or, in case of error, none of the documents should be assigned >> to the collection. Also, if the operation were to be followed by an SQL >> query to update the database with the number of documents assigned to >> collection C, that should be included in the "transaction" as well. >> >> Is there a straightforward way to do this with Lucene? Or are >> "transactions" a no-no for a system like Lucene and I should just go >> ahead without having transactional semantics? >> >> Thanks in advance, >> >> Marios Skounakis >> >> >> ------------------------------------------------------------------------ >> >> No virus found in this incoming message. >> Checked by AVG Free Edition. >> Version: 7.1.362 / Virus Database: 267.13.1/169 - Release Date: >> 11/15/2005 > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org