lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eran Sevi" <erans...@gmail.com>
Subject Re: Preventing index corruption
Date Thu, 26 Jun 2008 15:11:31 GMT
Thanks Erick.
You might be joking, but one of our clients indeed had all his servers
destroyed in a flood. Of course in this rare case, a solution would be to
keep the backup on another site.

However I'm still confused about normal scenarios:

Let's say that in the middle of the batch I got an exception and wan't to
rollback. Can I do this ?
I want to make sure that after a batch finishes (and only then), it is
written to disk (and not find out after a while during a commit that
something went wrong).Do I have to close the writer or Flush is enough? I
though about raising mergeFactor and other parameters to high values (or
disabling them) so an automatic merge/commit will not happen, and then I can
manually decide when to commit the changes - the size of the batches is not
constant so I can't determine in advance.
I don't mind hurting the index performance a bit by doing this manually, but
I can't efford to let the client think that the information is secured in
the index and than to find out that it's not.

My index size contains a few million docs and it's size can reach about 30G
(we're saving a lot of fields and information for each document). Having a
backup index is an option I considered but I wanted to avoid the overhead of
keeping them synchronized (they might not be on the same server which
exposes a lot of new problems like network issues).

Thanks.

On Thu, Jun 26, 2008 at 5:42 PM, Erick Erickson <erickerickson@gmail.com>
wrote:

> How big is your index? The simpleminded way would be to copy things around
> as your batches come in and only switch to the *real* one after the
> additions
> were verified.
>
> You could also just maintain two indexes but only update one at a time. In
> the
> 99.99% case where things went well, it would just be a matter of continuing
> on.
> Whenever "something bad happened", you could copy the good index over the
> bad one and go at it again.
>
> But to ask that no matter what, the index is OK is asking a lot.... There
> are fires and floods and earthquakes to consider <G>
>
> Best
> Erick
>
> On Thu, Jun 26, 2008 at 10:28 AM, Eran Sevi <eransevi@gmail.com> wrote:
>
> > Hi,
> >
> > I'm looking for the correct way to create an index given the following
> > restrictions:
> >
> > 1. The documents are received in batches of variable sizes (not more then
> > 100 docs in a batch).
> > 2. The batch insertion must be transactional - either the whole batch is
> > added to the index (exists physically on the disk), or the whole batch is
> > canceled/aborted and the index remains as before.
> > 3. The index must remain valid at all times and shouldn't be corrupted
> even
> > if a power interruption occurs - *most important*
> > 4. Index speed is less important than search speed.
> >
> > How should I use a writer with all these restrictions? Can I do it
> without
> > having to close the writer after each batch (maybe flush is enough)?
> >
> > Should I change the IndexWriter parameters such as mergeFactor,
> > RAMBufferSize, etc. ?
> > I want to make sure that partial batches are not written to the disk (if
> > the
> > computer crashes in the middle of the batch, I want to be able to work
> with
> > the index as it was before the crash).
> >
> > If I'm working with a single writer, is it guaranteed that no matter what
> > happens the index can be opened and used (I don't mind loosing docs, just
> > that the index won't be ruined).
> >
> > Thanks and sorry about the long list of questions,
> > Eran.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message