lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Itamar Syn-Hershko <ita...@code972.com>
Subject Re: Corrupt index
Date Thu, 14 Jun 2012 00:45:46 GMT
Mike,

On Wed, Jun 13, 2012 at 7:31 PM, Michael McCandless <
lucene@mikemccandless.com> wrote:

> Hi Itamar,
>
> One quick question: does Lucene.Net include the fixes done for
> LUCENE-1044 (to fsync files on commit)?  Those are very important for
> an index to be intact after OS/JVM crash or power loss.
>

Definitely, as Christopher noted we are about to release a 3.0.3 compatible
version, which is line-by-line port of the Java version.


> You shouldn't even have to run CheckIndex ... because (as of
> LUCENE-1044) we now fsync all segment files before writing the new
> segments_N file, and then removing old segments_N files (and any
> segments that are no longer referenced).
>
> You do have to remove the write.lock if you aren't using
> NativeFSLockFactory (but this has been the default lock impl for a
> while now).
>

Somewhat unrelated to this thread, but what should I expect to see? from
time to time we do see write.lock present after an app-crash or power
failure. Also, what are the steps that are expected to be performed in such
cases?


>
> > Last week I have been playing with rather large indexes and crashed my
> app
> > while it was indexing. I wasn't able to open the index, and Luke was even
> > kind enough to wipe the index folder clean even though I opened it in
> > read-only mode. I re-ran this, and after another crash running CheckIndex
> > revealed nothing - the index was detected to be an empty one. I am not
> > entirely sure what could be the cause for this, but I suspect it has
> > been corrupted by the crash.
>
> Had no commit completed (no segments file written)?
>
> If you don't fsync then all sorts of crazy things are possible...
>

Ok, so we do have fsync since LUCENE-1044 is present, and there were
segments present from previous commits. Any idea what went wrong?


> > I've been looking at these:
> >
> >
> https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> >
> https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>
> (And LUCENE-1044 before that ... it was LUCENE-1044 that LUCENE-2328broke...).
>

So 2328 broke 1044, and this was fixed only in 3.4, right? so 2328 made it
to a 3.0.x release while the fix for it (3418) was only released in 3.4. Am
I right?

If this is the case, 2328 probably made it's way to Lucene.Net since we are
using the released sources for porting, and we now need to apply 3418 in
the current version.

Does it make sense to just port FSDirectory from 3.4 to 3.0.3? or were
there API or other changes that will make our life miserable if we do that?


>
> > And it seems like this is what I was experiencing. Mike and Mark will
> > probably be able to tell if this is what they saw or not, but as far as I
> > can tell this is not an expected behavior of a Lucene index.
>
> Definitely not expected behavior: assuming nothing is flipping bits,
> then on OS/JVM crash or power loss your index should be fine, just
> reverted to the last successful commit.
>

What I suspected. Will try to reproduce reliably - any recommendations? not
really feeling like reinventing the wheel here...

MockDirectoryWrapper wasn't ported yet as it appears to only appear in 3.4,
and as you said it won't really help here anyway


>
> > What I'm looking for at the moment is some advice on what FSDirectory
> > implementation to use to make sure no corruption can happen. The 3.4
> version
> > (which is where LUCENE-3418 was committed to) seems to handle a lot of
> > things the 3.0 doesn't, but on the other hand LUCENE-3418 was
> introduced by
> > changes made to the 3.0 codebase.
>
> Hopefully it's just that you are missing fsync!
>
> > Also, is there any test in the suite checking for those scenarios?
>
> Our test framework has a sneaky MockDirectoryWrapper that, after a
> test finishes, goes and corrupts any unsync'd files and then verifies
> the index is still OK... it's good because it'll catch any times we
> are missing calls t sync, but, it's not low level enough such that if
> FSDir is failing to actually call fsync (that wsa the bug in
> LUCENE-3418) then it won't catch that...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>
>

Mime
View raw message