lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Troy Howard <thowar...@gmail.com>
Subject Re: Corrupt index
Date Thu, 14 Jun 2012 21:36:28 GMT
> If this is the case, 2328 probably made it's way to Lucene.Net since we are
> using the released sources for porting, and we now need to apply 3418 in
> the current version.

Iatmar: I confirmed that 2328 is in the latest code.

Thanks,
Troy


On Wed, Jun 13, 2012 at 5:45 PM, Itamar Syn-Hershko <itamar@code972.com> wrote:
> Mike,
>
> On Wed, Jun 13, 2012 at 7:31 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
>
>> Hi Itamar,
>>
>> One quick question: does Lucene.Net include the fixes done for
>> LUCENE-1044 (to fsync files on commit)?  Those are very important for
>> an index to be intact after OS/JVM crash or power loss.
>>
>
> Definitely, as Christopher noted we are about to release a 3.0.3 compatible
> version, which is line-by-line port of the Java version.
>
>
>> You shouldn't even have to run CheckIndex ... because (as of
>> LUCENE-1044) we now fsync all segment files before writing the new
>> segments_N file, and then removing old segments_N files (and any
>> segments that are no longer referenced).
>>
>> You do have to remove the write.lock if you aren't using
>> NativeFSLockFactory (but this has been the default lock impl for a
>> while now).
>>
>
> Somewhat unrelated to this thread, but what should I expect to see? from
> time to time we do see write.lock present after an app-crash or power
> failure. Also, what are the steps that are expected to be performed in such
> cases?
>
>
>>
>> > Last week I have been playing with rather large indexes and crashed my
>> app
>> > while it was indexing. I wasn't able to open the index, and Luke was even
>> > kind enough to wipe the index folder clean even though I opened it in
>> > read-only mode. I re-ran this, and after another crash running CheckIndex
>> > revealed nothing - the index was detected to be an empty one. I am not
>> > entirely sure what could be the cause for this, but I suspect it has
>> > been corrupted by the crash.
>>
>> Had no commit completed (no segments file written)?
>>
>> If you don't fsync then all sorts of crazy things are possible...
>>
>
> Ok, so we do have fsync since LUCENE-1044 is present, and there were
> segments present from previous commits. Any idea what went wrong?
>
>
>> > I've been looking at these:
>> >
>> >
>> https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> >
>> https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>
>> (And LUCENE-1044 before that ... it was LUCENE-1044 that LUCENE-2328broke...).
>>
>
> So 2328 broke 1044, and this was fixed only in 3.4, right? so 2328 made it
> to a 3.0.x release while the fix for it (3418) was only released in 3.4. Am
> I right?
>
> If this is the case, 2328 probably made it's way to Lucene.Net since we are
> using the released sources for porting, and we now need to apply 3418 in
> the current version.
>
> Does it make sense to just port FSDirectory from 3.4 to 3.0.3? or were
> there API or other changes that will make our life miserable if we do that?
>
>
>>
>> > And it seems like this is what I was experiencing. Mike and Mark will
>> > probably be able to tell if this is what they saw or not, but as far as I
>> > can tell this is not an expected behavior of a Lucene index.
>>
>> Definitely not expected behavior: assuming nothing is flipping bits,
>> then on OS/JVM crash or power loss your index should be fine, just
>> reverted to the last successful commit.
>>
>
> What I suspected. Will try to reproduce reliably - any recommendations? not
> really feeling like reinventing the wheel here...
>
> MockDirectoryWrapper wasn't ported yet as it appears to only appear in 3.4,
> and as you said it won't really help here anyway
>
>
>>
>> > What I'm looking for at the moment is some advice on what FSDirectory
>> > implementation to use to make sure no corruption can happen. The 3.4
>> version
>> > (which is where LUCENE-3418 was committed to) seems to handle a lot of
>> > things the 3.0 doesn't, but on the other hand LUCENE-3418 was
>> introduced by
>> > changes made to the 3.0 codebase.
>>
>> Hopefully it's just that you are missing fsync!
>>
>> > Also, is there any test in the suite checking for those scenarios?
>>
>> Our test framework has a sneaky MockDirectoryWrapper that, after a
>> test finishes, goes and corrupts any unsync'd files and then verifies
>> the index is still OK... it's good because it'll catch any times we
>> are missing calls t sync, but, it's not low level enough such that if
>> FSDir is failing to actually call fsync (that wsa the bug in
>> LUCENE-3418) then it won't catch that...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message