lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Preventing index corruption
Date Thu, 26 Jun 2008 14:42:08 GMT
How big is your index? The simpleminded way would be to copy things around
as your batches come in and only switch to the *real* one after the
were verified.

You could also just maintain two indexes but only update one at a time. In
99.99% case where things went well, it would just be a matter of continuing
Whenever "something bad happened", you could copy the good index over the
bad one and go at it again.

But to ask that no matter what, the index is OK is asking a lot.... There
are fires and floods and earthquakes to consider <G>


On Thu, Jun 26, 2008 at 10:28 AM, Eran Sevi <> wrote:

> Hi,
> I'm looking for the correct way to create an index given the following
> restrictions:
> 1. The documents are received in batches of variable sizes (not more then
> 100 docs in a batch).
> 2. The batch insertion must be transactional - either the whole batch is
> added to the index (exists physically on the disk), or the whole batch is
> canceled/aborted and the index remains as before.
> 3. The index must remain valid at all times and shouldn't be corrupted even
> if a power interruption occurs - *most important*
> 4. Index speed is less important than search speed.
> How should I use a writer with all these restrictions? Can I do it without
> having to close the writer after each batch (maybe flush is enough)?
> Should I change the IndexWriter parameters such as mergeFactor,
> RAMBufferSize, etc. ?
> I want to make sure that partial batches are not written to the disk (if
> the
> computer crashes in the middle of the batch, I want to be able to work with
> the index as it was before the crash).
> If I'm working with a single writer, is it guaranteed that no matter what
> happens the index can be opened and used (I don't mind loosing docs, just
> that the index won't be ruined).
> Thanks and sorry about the long list of questions,
> Eran.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message