lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Goller <gol...@detego-software.de>
Subject Re: Deleting a document with an IndexWriter open
Date Fri, 16 Jul 2004 13:58:20 GMT
Giulio Cesare Solaroli wrote:
> I have been thinking about this for a while, but could not find out a
> reasonable solution.
> The basic problems are:
> - where do I (safely) store the index of the documents that needs to be deleted?
> - how can I uniquely identify the Lucene documents that I have to
> delete, given that there are different Lucene document matching a
> single "real" document?
> 
> The second problem could be "easily" solved adding a kind of version
> field (stored in the Lucene index) that is incremented every time a
> new version of a document is inserted. In this way, when searching for
> duplicated documents (using the "real" document ID) I will find a set
> of Lucene documents and I could delete all but the one with the
> highest version number.

You need unique document ids. They may either be produced by the
fulltext-Index (example 1) or they may come from outside (example 2):

1) You could use a unique id for every doucment added to the Lucene index
(a kind of counter for the number of added documents). You have to provide
this number by yourself. It is not provided by Lucene! We are doing this
in some applications. This unique id is stored in a dedicated field and in
your database you associate this unique id with your document. If you change
your document in the database, you find the unique id there and thus you know
which document to delete in the Lucene index. If the changed document is added
to the Lucene-Index, you get a new unique id and store this one with the changed
document in your database.

2) In another application we store a url of each document in the Lucene index.
If the document underlying the url has changed, we know which document to delete
in the Lucene index simply via the url and we store the new version of the 
document again with a url-field.

Christoph


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message