lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: deleting/updating/identifying a document
Date Fri, 20 Jul 2007 13:17:51 GMT
Hi Samuel,

Indeed, you can have a PK-like identifier field in each Lucene Document and use deleteDocument(new
Term("your PK field", "your ID"))
While Having an ID field that uniquely identifies a document is not a must, it is a Lucene
best practice in my book and experience so far.

Otis
--
Lucene Consulting -- http://lucene-consulting.com/


----- Original Message ----
From: Samuel LEMOINE <samuel.lemoine@lingway.com>
To: java-user@lucene.apache.org
Sent: Friday, July 20, 2007 3:00:57 PM
Subject: deleting/updating/identifying a document

Hi everybody !

I'm asking myself about the way Lucene deals with deleting documents.
As far as I know, a document is identified by a document number, but 
this document number is not reliable for long-term issues as it may 
change on segment merging.
The way Lucene deletes documents' data from the index questions me, 
cause it relies on terms (or document number, which as told above is not 
reliable, and must be retrieved from a request). The methods I've found 
for deleting documents from the index are those from IndexWriter and 
IndexReader classes, deleteDocuments(term ) or deleteDocuments(term[] ).
These methods deletes the index'entries containing the given term. 
According to the API javadoc, deleteDocuments(term[] ) will delete each 
file that contains at least one of the given terms: if it really works 
in this way, I don't really understand why it's does so. Wouldn't it be 
more useful if it deleted each file containing *all of* the given terms? 
(or maybe it'is the way it works actually?)
These reflexions lead me to conclude that, in order to be able to remove 
the entries of a specific document in a Lucene index, we must store an 
untokenized field to identify each document solely. I find it strange 
having to use such an "artifice" to keep traces of documents 
independantly. It's not very impeding, it's just... strange.
Any contributive thinkings on this matter are welcome :)

Thanks for reading,

Samuel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message