lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Johan Stuyts" <j.stu...@hippo.nl>
Subject Preventing merging by IndexWriter
Date Tue, 17 Oct 2006 14:33:54 GMT
Hi,

(I am using Lucene 2.0.0)

I have been looking at a way to use stable IDs with Lucene. The reason I
want this is so I can efficiently store and retrieve information outside
of Lucene for filtering search results. It looks like this is going to
require most of Lucene to be rewritten, so I gave up on that approach.

I have a new idea where I want the documents IDs to only change at a
specific moment instead of whenever Lucene choses to do so. This way the
document IDs remain stable and I can use these IDs in the external data.
I want to merge the segments of the index at a specific moment because
updating the external data to match the new document IDs is too
expensive to do continuously. At the moment that I want to merge the
segments of the index causing the document IDs to change, I can also
update my external data so the correct data is attached to the correct
Lucene document ID. If I understand correctly, merging only shifts
document IDs to remove deleted document IDs, so I can do the same
shifting with the external data by getting the set of deleted documents
before the merge.

I already set 'mergeFactor' and 'maxBufferedDocs' to very high values so
all documents of a batch will be stored in RAM. The problem I am facing
is that the IndexWriter merges the segments in RAM with the segments on
disk when I close the IndexWriter. What I need instead is that the
IndexWriter will create a new segment on disk containing the data from
the segment(s) in RAM. This way the document IDs of the exising disk
segments are not affected.

Creating a new segment instead of merging with the existing ones will
also cause lots of segments with a variable number of documents to be
created on disk, but I believe the IndexReader/IndexSearcher is able to
handle this. I only have to make sure that the number of segments does
not become to high (i.e. merge regularly) because this might cause 'too
many open files' errors.

So my questions are: is there a way to prevent the IndexWriter from
merging, forcing it to create a new segment for each indexing batch? And
if so, will it still be possible to merge the disk segments when I want
to?

Kind regards,

Johan Stuyts
Hippo 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message