lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <>
Subject Can I delete without shuffling document IDs?
Date Fri, 29 Jun 2007 03:08:56 GMT
Hi all.

Is there currently any way to delete documents from the middle of a text index 
without a risk of the document IDs changing later?  I'm aware that they 
probably won't change unless we optimise or unless the user adds more data, 
but unfortunately adding more data is now a potential occurrence.

I already know I can simulate this behaviour by keeping a filter on disk and 
using it with every query, but if there is some tricky way to do it natively 
I might be able to save some of the potential overhead, as well as some disk 
space by not storing the documents we're not using anymore.

(The reason we want the IDs to be stable is that we need a way to correlate 
the documents with an external database and the cost of retrieving the 
document from the Hits object was too slow compared to retrieving the id.)

Another potential way around this is to maintain a mapping table from actual 
document ID to the sequence ID. (e.g. if documents 1000 through 1999 are 
deleted, there would be an entry in the table saying that ID 2000 starts at 
document ID 1000.)

I just wanted to put the question out in case someone has solved the exact 
same problem already.


Daniel Noll
Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
Web:                               Fax: +61 2 9212 6902

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message