lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marios Skounakis" <>
Subject Re: Lucene & Transactional semantics
Date Fri, 18 Nov 2005 13:46:12 GMT

Here is an idea I have been working on as a workaround:

Suppose you want to create a new document. The steps to do that are:
1. Insert the document into a "Pending Documents" table in the database.
2. Index the document with Lucene
3. Insert the document into the "Documents" table in the database, and 
remove it from the "Pending Documents" table in a single transaction.

Periodically delete from Lucene's index Documents found in the "Pending 
Documents" table.

Also, when returning results, filter out Documents found in the "Pending 
Documents" table.

Basically, the Pending Documents table stores the documents that have been 
indexed by Lucene but have not yet been inserted in the database. Note how 
if an error happens between steps 2 and 3, or in step 3, the document will 
be found in the Pending Documents table. So you are kind of implementing a 
rollback for the whole procedure by deleting whatever is found in this 
table. If everything goes well, you remove the Document from Pending 
Documents, and then you know it exists both in the database and Lucene's 

Also, if an error happens after 1, you are simply left with an entry in the 
Pending Documents table which you can remove when you discover that there is 
no corresponding document in Lucene's index.

This is of course rather ad-hoc and does not generalize well to other types 
of queries (e.g. updates, etc). But it can be a viable workaround if you 
don't want the added complexity of efforts like Compass.

What do you think?

Marios Skounakis

----- Original Message ----- 
From: "Beto Siless" <>
To: <>
Sent: Thursday, November 17, 2005 11:18 PM
Subject: Re: Lucene & Transactional semantics

> Hi, I'm with the transaction problem too: I have Documents which are 
> represented by a Business Object (persisted in a DB with an ORM), indexed 
> with Lucene and finally stored in the file system. So it's very difficult 
> to maintain the consistency in an error scenario.
> The main problem is that if you implement some ad-hoc transaction with 
> Lucene (working in a RAMDirectory or keeping the commands to apply until 
> the end), you still have to coordinate the lucene transaction with the 
> others. Cause if lucene transaction rollbacks you can abort the db 
> transaction, but if lucene transaction commits you can't do anything if 
> the DB transaction fails with out a 3pc transaction manager.
> Does Anybody have an idea about how to reduce the error time window? Could 
> this problem be solved storing the index in a database?
> Thanks
> Beto
> Marios Skounakis wrote:
>> Hi all,
>> I am interested in developing a system which will use Lucene to implement 
>> the search functionality. A key characteristic of this system is that 
>> certain information about the indexed documents will be editable by the 
>> user administrators. For instance, the user administrators can manually 
>> create "document collections" and assign some of the indexed documents to 
>> them. One way to implement document collections would by having documents 
>> have a dedicated field for storing the document collection id, and 
>> storing the document collection information in a database.
>> Ideally, such an operation as the above should have transactional 
>> semantics, i.e. if a user wants to assign documents x, y and z to 
>> collection C, then either all three documents should be assigned to the 
>> collection or, in case of error, none of the documents should be assigned 
>> to the collection. Also, if the operation were to be followed by an SQL 
>> query to update the database with the number of documents assigned to 
>> collection C, that should be included in the "transaction" as well.
>> Is there a straightforward way to do this with Lucene? Or are 
>> "transactions" a no-no for a system like Lucene and I should just go 
>> ahead without having transactional semantics?
>> Thanks in advance,
>> Marios Skounakis
>> ------------------------------------------------------------------------
>> No virus found in this incoming message.
>> Checked by AVG Free Edition.
>> Version: 7.1.362 / Virus Database: 267.13.1/169 - Release Date: 
>> 11/15/2005
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message