lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Corrupt index
Date Fri, 15 Jun 2012 11:32:36 GMT
I think the 0-segment segments_1 file is expected in Lucene.Net since
we changed that later, in 3.1 in Lucene (LUCENE-2386)?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jun 14, 2012 at 8:40 PM, Itamar Syn-Hershko <itamar@code972.com> wrote:
> I can confirm 2.9.4 had autoCommit, but it is gone in 3.0.3 already, so
> Lucene.Net doesn't have autoCommit.
>
> So I don't have autoCommit set to true, but I can clearly see a segments_1
> file there along with the other files. If that helpes, it always keeps with
> the name segments_1 with 32 bytes, never changes.
>
> And as again, if I kill the process and try to open the index with Luke 3.3,
> the index folder is being wiped out.
>
> Not sure what to make of all that.
>
> On Fri, Jun 15, 2012 at 3:21 AM, Michael McCandless
> <lucene@mikemccandless.com> wrote:
>>
>> Hmm, OK: in 2.9.4 / 3.0.x, if you open IW on a new directory, it will
>> make a zero-segment commit.  This was changed/fixed in 3.1 with
>> LUCENE-2386.
>>
>> In 2.9.x (not 3.0.x) there is still an autoCommit parameter,
>> defaulting to false, but if you set it to true then IndexWriter will
>> periodically commit.
>>
>> Seeing segment files created and merge is definitely expected, but
>> it's not expected to see segments_N files unless you pass
>> autoCommit=true.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> On Thu, Jun 14, 2012 at 8:14 PM, Itamar Syn-Hershko <itamar@code972.com>
>> wrote:
>> > Not what I'm seeing. I actually see a lot of segments created and merged
>> > while it operates. Expected?
>> >
>> > Reminding you, this is 2.9.4 / 3.0.3
>> >
>> > On Fri, Jun 15, 2012 at 3:10 AM, Michael McCandless
>> > <lucene@mikemccandless.com> wrote:
>> >>
>> >> Right: Lucene never autocommits anymore ...
>> >>
>> >> If you create a new index, add a bunch of docs, and things crash
>> >> before you have a chance to commit, then there is no index (not even a
>> >> 0 doc one) in that directory.
>> >>
>> >> Mike McCandless
>> >>
>> >> http://blog.mikemccandless.com
>> >>
>> >> On Thu, Jun 14, 2012 at 1:41 PM, Itamar Syn-Hershko
>> >> <itamar@code972.com>
>> >> wrote:
>> >> > I'm quite certain this shouldn't happen also when Commit wasn't
>> >> > called.
>> >> >
>> >> > Mike, can you comment on that?
>> >> >
>> >> > On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens
>> >> > <currens.chris@gmail.com> wrote:
>> >> >>
>> >> >> Well, the only thing I see is that there is no place where
>> >> >> writer.Commit()
>> >> >> is called in the delegate assigned to corpusReader.OnDocument.
 I
>> >> >> know
>> >> >> that
>> >> >> lucene is very transactional, and at least in 3.x, the writer will
>> >> >> never
>> >> >> auto commit to the index.  You can write millions of documents,
but
>> >> >> if
>> >> >> commit is never called, those documents aren't actually part of
the
>> >> >> index.
>> >> >>  Committing isn't a cheap operation, so you definitely don't want
to
>> >> >> do
>> >> >> it
>> >> >> on every document.
>> >> >>
>> >> >> You can test it yourself with this (naive) solution.  Right below
>> >> >> the
>> >> >> writer.SetUseCompoundFile(false) line, add "int numDocsAdded =
0;".
>> >> >>  At
>> >> >> the
>> >> >> end of the corpusReader.OnDocument delegate add:
>> >> >>
>> >> >> // Example only.  I wouldn't suggest committing this often
>> >> >> if(++numDocsAdded % 5 == 0)
>> >> >> {
>> >> >>    writer.Commit();
>> >> >> }
>> >> >>
>> >> >> I had the application crash for real on this file:
>> >> >>
>> >> >>
>> >> >>
>> >> >> http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2,
>> >> >> about 20% into the operation.  Without the commit, the index is
>> >> >> empty.
>> >> >>  Add
>> >> >> it in, and I get 755 files in the index after it crashes.
>> >> >>
>> >> >>
>> >> >> Thanks,
>> >> >> Christopher
>> >> >>
>> >> >> On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko
>> >> >> <itamar@code972.com>wrote:
>> >> >>
>> >> >>
>> >> >> > Yes, reproduced in first try. See attached program - I referenced
>> >> >> > it
>> >> >> > to
>> >> >> > current trunk.
>> >> >> >
>> >> >> >
>> >> >> > On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko
>> >> >> > <itamar@code972.com>wrote:
>> >> >> >
>> >> >> >> Christopher,
>> >> >> >>
>> >> >> >> I used the IndexBuilder app from here
>> >> >> >> https://github.com/synhershko/Talks/tree/master/LuceneNeatThings
>> >> >> >> with a
>> >> >> >> 8.5GB wikipedia dump.
>> >> >> >>
>> >> >> >> After running for 2.5 days I had to forcefully close it
(infinite
>> >> >> >> loop
>> >> >> >> in
>> >> >> >> the wiki-markdown parser at 92%, go figure), and the 40-something
>> >> >> >> GB
>> >> >> >> index
>> >> >> >> I had by then was unusable. I then was able to reproduce
this
>> >> >> >>
>> >> >> >> Please note I now added a few safe-guards you might want
to
>> >> >> >> remove
>> >> >> >> to
>> >> >> >> make sure the app really crashes on process kill.
>> >> >> >>
>> >> >> >> I'll try to come up with a better way to reproduce this
-
>> >> >> >> hopefully
>> >> >> >> Mike
>> >> >> >> will be able to suggest better ways than manual process
kill...
>> >> >> >>
>> >> >> >> On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens <
>> >> >> >> currens.chris@gmail.com> wrote:
>> >> >> >>
>> >> >> >>> Mike, The codebase for lucene.net should be almost
identical to
>> >> >> >>> java's
>> >> >> >>> 3.0.3 release, and LUCENE-1044 is included in that.
>> >> >> >>>
>> >> >> >>> Itamar, are you committing the index regularly?  I
only ask
>> >> >> >>> because
>> >> >> >>> I
>> >> >> >>> can't
>> >> >> >>> reproduce it myself by forcibly terminating the process
while
>> >> >> >>> it's
>> >> >> >>> indexing.  I've tried both 3.0.3 and 2.9.4.  If
I don't commit
>> >> >> >>> at
>> >> >> >>> all
>> >> >> >>> and
>> >> >> >>> terminate the process (even with a 10,000 4K documents
created),
>> >> >> >>> there
>> >> >> >>> will
>> >> >> >>> be no documents in the index when I open it in luke,
which I
>> >> >> >>> expect.
>> >> >> >>>  If
>> >> >> >>> I
>> >> >> >>> commit at 10,000 documents, and terminate it a few
thousand
>> >> >> >>> after
>> >> >> >>> that,
>> >> >> >>> the
>> >> >> >>> index has the first ten thousand that were committed.
 I've even
>> >> >> >>> terminated
>> >> >> >>> it *while* a second commit was taking place, and it
still had
>> >> >> >>> all
>> >> >> >>> of
>> >> >> >>> the
>> >> >> >>> documents I expected.
>> >> >> >>>
>> >> >> >>> It may be that I'm not trying to reproducing it correctly.
 Do
>> >> >> >>> you
>> >> >> >>> have a
>> >> >> >>> minimal amount of code that can reproduce it?
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> Thanks,
>> >> >> >>> Christopher
>> >> >> >>>
>> >> >> >>> On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless
<
>> >> >> >>> lucene@mikemccandless.com> wrote:
>> >> >> >>>
>> >> >> >>> > Hi Itamar,
>> >> >> >>> >
>> >> >> >>> > One quick question: does Lucene.Net include the
fixes done for
>> >> >> >>> > LUCENE-1044 (to fsync files on commit)?  Those
are very
>> >> >> >>> > important
>> >> >> >>> > for
>> >> >> >>> > an index to be intact after OS/JVM crash or power
loss.
>> >> >> >>> >
>> >> >> >>> > More responses below:
>> >> >> >>> >
>> >> >> >>> > On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko
<
>> >> >> >>> itamar@code972.com>
>> >> >> >>> > wrote:
>> >> >> >>> >
>> >> >> >>> > > I'm a Lucene.Net committer, and there is
a chance we have a
>> >> >> >>> > > bug
>> >> >> >>> > > in
>> >> >> >>> our
>> >> >> >>> > > FSDirectory implementation that causes indexes
to get
>> >> >> >>> > > corrupted
>> >> >> >>> > > when
>> >> >> >>> > > indexing is cut while the IW is still open.
As it roots from
>> >> >> >>> > > some
>> >> >> >>> > > retroactive fixes you made, I'd appreciate
your feedback.
>> >> >> >>> > >
>> >> >> >>> > > Correct me if I'm wrong, but by design Lucene
should be able
>> >> >> >>> > > to
>> >> >> >>> recover
>> >> >> >>> > > rather quickly from power failures or app
crashes. Since
>> >> >> >>> > > existing
>> >> >> >>> segment
>> >> >> >>> > > files are read only, only new segments that
are still being
>> >> >> >>> > > written
>> >> >> >>> can
>> >> >> >>> > get
>> >> >> >>> > > corrupted. Hence, recovering from worst-case
scenarios is
>> >> >> >>> > > done
>> >> >> >>> > > by
>> >> >> >>> simply
>> >> >> >>> > > removing the write.lock file. The worst
that could happen
>> >> >> >>> > > then
>> >> >> >>> > > is
>> >> >> >>> having
>> >> >> >>> > the
>> >> >> >>> > > last segment damaged, and that can be fixed
by removing
>> >> >> >>> > > those
>> >> >> >>> > > files,
>> >> >> >>> > > possibly by running CheckIndex on the index.
>> >> >> >>> >
>> >> >> >>> > You shouldn't even have to run CheckIndex ...
because (as of
>> >> >> >>> > LUCENE-1044) we now fsync all segment files before
writing the
>> >> >> >>> > new
>> >> >> >>> > segments_N file, and then removing old segments_N
files (and
>> >> >> >>> > any
>> >> >> >>> > segments that are no longer referenced).
>> >> >> >>> >
>> >> >> >>> > You do have to remove the write.lock if you aren't
using
>> >> >> >>> > NativeFSLockFactory (but this has been the default
lock impl
>> >> >> >>> > for
>> >> >> >>> > a
>> >> >> >>> > while now).
>> >> >> >>> >
>> >> >> >>> > > Last week I have been playing with rather
large indexes and
>> >> >> >>> > > crashed
>> >> >> >>> my
>> >> >> >>> > app
>> >> >> >>> > > while it was indexing. I wasn't able to
open the index, and
>> >> >> >>> > > Luke
>> >> >> >>> > > was
>> >> >> >>> even
>> >> >> >>> > > kind enough to wipe the index folder clean
even though I
>> >> >> >>> > > opened
>> >> >> >>> > > it
>> >> >> >>> > > in
>> >> >> >>> > > read-only mode. I re-ran this, and after
another crash
>> >> >> >>> > > running
>> >> >> >>> CheckIndex
>> >> >> >>> > > revealed nothing - the index was detected
to be an empty
>> >> >> >>> > > one. I
>> >> >> >>> > > am
>> >> >> >>> not
>> >> >> >>> > > entirely sure what could be the cause for
this, but I
>> >> >> >>> > > suspect
>> >> >> >>> > > it
>> >> >> >>> > > has
>> >> >> >>> > > been corrupted by the crash.
>> >> >> >>> >
>> >> >> >>> > Had no commit completed (no segments file written)?
>> >> >> >>> >
>> >> >> >>> > If you don't fsync then all sorts of crazy things
are
>> >> >> >>> > possible...
>> >> >> >>> >
>> >> >> >>> > > I've been looking at these:
>> >> >> >>> > >
>> >> >> >>> > >
>> >> >> >>> >
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> >> >> >>> > >
>> >> >> >>> >
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> >> >> >>> >
>> >> >> >>> > (And LUCENE-1044 before that ... it was LUCENE-1044
that
>> >> >> >>> > LUCENE-2328
>> >> >> >>> > broke...).
>> >> >> >>> >
>> >> >> >>> > > And it seems like this is what I was experiencing.
Mike and
>> >> >> >>> > > Mark
>> >> >> >>> > > will
>> >> >> >>> > > probably be able to tell if this is what
they saw or not,
>> >> >> >>> > > but
>> >> >> >>> > > as
>> >> >> >>> > > far
>> >> >> >>> as I
>> >> >> >>> > > can tell this is not an expected behavior
of a Lucene index.
>> >> >> >>> >
>> >> >> >>> > Definitely not expected behavior: assuming nothing
is flipping
>> >> >> >>> > bits,
>> >> >> >>> > then on OS/JVM crash or power loss your index
should be fine,
>> >> >> >>> > just
>> >> >> >>> > reverted to the last successful commit.
>> >> >> >>> >
>> >> >> >>> > > What I'm looking for at the moment is some
advice on what
>> >> >> >>> > > FSDirectory
>> >> >> >>> > > implementation to use to make sure no corruption
can happen.
>> >> >> >>> > > The
>> >> >> >>> > > 3.4
>> >> >> >>> > version
>> >> >> >>> > > (which is where LUCENE-3418 was committed
to) seems to
>> >> >> >>> > > handle a
>> >> >> >>> > > lot
>> >> >> >>> of
>> >> >> >>> > > things the 3.0 doesn't, but on the other
hand LUCENE-3418
>> >> >> >>> > > was
>> >> >> >>> introduced
>> >> >> >>> > by
>> >> >> >>> > > changes made to the 3.0 codebase.
>> >> >> >>> >
>> >> >> >>> > Hopefully it's just that you are missing fsync!
>> >> >> >>> >
>> >> >> >>> > > Also, is there any test in the suite checking
for those
>> >> >> >>> > > scenarios?
>> >> >> >>> >
>> >> >> >>> > Our test framework has a sneaky MockDirectoryWrapper
that,
>> >> >> >>> > after
>> >> >> >>> > a
>> >> >> >>> > test finishes, goes and corrupts any unsync'd
files and then
>> >> >> >>> > verifies
>> >> >> >>> > the index is still OK... it's good because it'll
catch any
>> >> >> >>> > times
>> >> >> >>> > we
>> >> >> >>> > are missing calls t sync, but, it's not low level
enough such
>> >> >> >>> > that
>> >> >> >>> > if
>> >> >> >>> > FSDir is failing to actually call fsync (that
wsa the bug in
>> >> >> >>> > LUCENE-3418) then it won't catch that...
>> >> >> >>> >
>> >> >> >>> > Mike McCandless
>> >> >> >>> >
>> >> >> >>> > http://blog.mikemccandless.com
>> >> >> >>> >
>> >> >> >>>
>> >> >> >>
>> >> >> >>
>> >> >> >
>> >> >
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: dev-help@lucene.apache.org
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message