Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A708B9EFA for ; Fri, 15 Jun 2012 00:40:46 +0000 (UTC) Received: (qmail 57245 invoked by uid 500); 15 Jun 2012 00:40:45 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 57168 invoked by uid 500); 15 Jun 2012 00:40:45 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 57152 invoked by uid 99); 15 Jun 2012 00:40:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jun 2012 00:40:45 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of itamar.synhershko@gmail.com designates 209.85.213.48 as permitted sender) Received: from [209.85.213.48] (HELO mail-yw0-f48.google.com) (209.85.213.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jun 2012 00:40:40 +0000 Received: by yhfq46 with SMTP id q46so2411126yhf.35 for ; Thu, 14 Jun 2012 17:40:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=xCgqyKs1CnPryZwORmxvMhn4sfdwZo9dktc8da5fi14=; b=pSM7DTQf+mGBpaQC5MHH5SeuqL2lguAnG3JrK5AYKE8L1uuo9BSQpDzNgMDeLEjYgq 7d2FPDdNIy3+s4ndot1zui6JVOvGytzKdjMIMiL4dCg5lu2qQrtEG22MRLR/I1uUwR48 vIbpl0tOD25TV3o7dcHJ2EevtO8GcVN2cm2x1uaGESWFCgQUZrZQVUpxO/Y2nVE9gnao 5spqOwHRLZJpgSM2HUqnWqewzugkKbLjTwFzYAKCsjRvA7Bi52XD0wmzk9GXLesliSfP riAQAmB3AMUMsaqnbDLrE25zpljlac8PbwUvo16FOYocnLa15tnE0LkB5jp5PwVjOrhy 5ZIQ== MIME-Version: 1.0 Received: by 10.50.222.137 with SMTP id qm9mr2071976igc.64.1339720819443; Thu, 14 Jun 2012 17:40:19 -0700 (PDT) Sender: itamar.synhershko@gmail.com Received: by 10.231.43.196 with HTTP; Thu, 14 Jun 2012 17:40:19 -0700 (PDT) In-Reply-To: References: Date: Fri, 15 Jun 2012 03:40:19 +0300 X-Google-Sender-Auth: u4c6af9qOZhn8vncZEvmJY591lg Message-ID: Subject: Re: Corrupt index From: Itamar Syn-Hershko To: dev@lucene.apache.org Cc: lucene-net-dev@lucene.apache.org Content-Type: multipart/alternative; boundary=14dae93410738a892504c2780f86 X-Virus-Checked: Checked by ClamAV on apache.org --14dae93410738a892504c2780f86 Content-Type: text/plain; charset=ISO-8859-1 I can confirm 2.9.4 had autoCommit, but it is gone in 3.0.3 already, so Lucene.Net doesn't have autoCommit. So I don't have autoCommit set to true, but I can clearly see a segments_1 file there along with the other files. If that helpes, it always keeps with the name segments_1 with 32 bytes, never changes. And as again, if I kill the process and try to open the index with Luke 3.3, the index folder is being wiped out. Not sure what to make of all that. On Fri, Jun 15, 2012 at 3:21 AM, Michael McCandless < lucene@mikemccandless.com> wrote: > Hmm, OK: in 2.9.4 / 3.0.x, if you open IW on a new directory, it will > make a zero-segment commit. This was changed/fixed in 3.1 with > LUCENE-2386. > > In 2.9.x (not 3.0.x) there is still an autoCommit parameter, > defaulting to false, but if you set it to true then IndexWriter will > periodically commit. > > Seeing segment files created and merge is definitely expected, but > it's not expected to see segments_N files unless you pass > autoCommit=true. > > Mike McCandless > > http://blog.mikemccandless.com > > On Thu, Jun 14, 2012 at 8:14 PM, Itamar Syn-Hershko > wrote: > > Not what I'm seeing. I actually see a lot of segments created and merged > > while it operates. Expected? > > > > Reminding you, this is 2.9.4 / 3.0.3 > > > > On Fri, Jun 15, 2012 at 3:10 AM, Michael McCandless > > wrote: > >> > >> Right: Lucene never autocommits anymore ... > >> > >> If you create a new index, add a bunch of docs, and things crash > >> before you have a chance to commit, then there is no index (not even a > >> 0 doc one) in that directory. > >> > >> Mike McCandless > >> > >> http://blog.mikemccandless.com > >> > >> On Thu, Jun 14, 2012 at 1:41 PM, Itamar Syn-Hershko > > >> wrote: > >> > I'm quite certain this shouldn't happen also when Commit wasn't > called. > >> > > >> > Mike, can you comment on that? > >> > > >> > On Thu, Jun 14, 2012 at 8:03 PM, Christopher Currens > >> > wrote: > >> >> > >> >> Well, the only thing I see is that there is no place where > >> >> writer.Commit() > >> >> is called in the delegate assigned to corpusReader.OnDocument. I > know > >> >> that > >> >> lucene is very transactional, and at least in 3.x, the writer will > >> >> never > >> >> auto commit to the index. You can write millions of documents, but > if > >> >> commit is never called, those documents aren't actually part of the > >> >> index. > >> >> Committing isn't a cheap operation, so you definitely don't want to > do > >> >> it > >> >> on every document. > >> >> > >> >> You can test it yourself with this (naive) solution. Right below the > >> >> writer.SetUseCompoundFile(false) line, add "int numDocsAdded = 0;". > At > >> >> the > >> >> end of the corpusReader.OnDocument delegate add: > >> >> > >> >> // Example only. I wouldn't suggest committing this often > >> >> if(++numDocsAdded % 5 == 0) > >> >> { > >> >> writer.Commit(); > >> >> } > >> >> > >> >> I had the application crash for real on this file: > >> >> > >> >> > >> >> > http://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-meta-history.xml.bz2 > , > >> >> about 20% into the operation. Without the commit, the index is > empty. > >> >> Add > >> >> it in, and I get 755 files in the index after it crashes. > >> >> > >> >> > >> >> Thanks, > >> >> Christopher > >> >> > >> >> On Wed, Jun 13, 2012 at 6:13 PM, Itamar Syn-Hershko > >> >> wrote: > >> >> > >> >> > >> >> > Yes, reproduced in first try. See attached program - I referenced > it > >> >> > to > >> >> > current trunk. > >> >> > > >> >> > > >> >> > On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko > >> >> > wrote: > >> >> > > >> >> >> Christopher, > >> >> >> > >> >> >> I used the IndexBuilder app from here > >> >> >> https://github.com/synhershko/Talks/tree/master/LuceneNeatThings > >> >> >> with a > >> >> >> 8.5GB wikipedia dump. > >> >> >> > >> >> >> After running for 2.5 days I had to forcefully close it (infinite > >> >> >> loop > >> >> >> in > >> >> >> the wiki-markdown parser at 92%, go figure), and the 40-something > GB > >> >> >> index > >> >> >> I had by then was unusable. I then was able to reproduce this > >> >> >> > >> >> >> Please note I now added a few safe-guards you might want to remove > >> >> >> to > >> >> >> make sure the app really crashes on process kill. > >> >> >> > >> >> >> I'll try to come up with a better way to reproduce this - > hopefully > >> >> >> Mike > >> >> >> will be able to suggest better ways than manual process kill... > >> >> >> > >> >> >> On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens < > >> >> >> currens.chris@gmail.com> wrote: > >> >> >> > >> >> >>> Mike, The codebase for lucene.net should be almost identical to > >> >> >>> java's > >> >> >>> 3.0.3 release, and LUCENE-1044 is included in that. > >> >> >>> > >> >> >>> Itamar, are you committing the index regularly? I only ask > because > >> >> >>> I > >> >> >>> can't > >> >> >>> reproduce it myself by forcibly terminating the process while > it's > >> >> >>> indexing. I've tried both 3.0.3 and 2.9.4. If I don't commit at > >> >> >>> all > >> >> >>> and > >> >> >>> terminate the process (even with a 10,000 4K documents created), > >> >> >>> there > >> >> >>> will > >> >> >>> be no documents in the index when I open it in luke, which I > >> >> >>> expect. > >> >> >>> If > >> >> >>> I > >> >> >>> commit at 10,000 documents, and terminate it a few thousand after > >> >> >>> that, > >> >> >>> the > >> >> >>> index has the first ten thousand that were committed. I've even > >> >> >>> terminated > >> >> >>> it *while* a second commit was taking place, and it still had all > >> >> >>> of > >> >> >>> the > >> >> >>> documents I expected. > >> >> >>> > >> >> >>> It may be that I'm not trying to reproducing it correctly. Do > you > >> >> >>> have a > >> >> >>> minimal amount of code that can reproduce it? > >> >> >>> > >> >> >>> > >> >> >>> Thanks, > >> >> >>> Christopher > >> >> >>> > >> >> >>> On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless < > >> >> >>> lucene@mikemccandless.com> wrote: > >> >> >>> > >> >> >>> > Hi Itamar, > >> >> >>> > > >> >> >>> > One quick question: does Lucene.Net include the fixes done for > >> >> >>> > LUCENE-1044 (to fsync files on commit)? Those are very > important > >> >> >>> > for > >> >> >>> > an index to be intact after OS/JVM crash or power loss. > >> >> >>> > > >> >> >>> > More responses below: > >> >> >>> > > >> >> >>> > On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko < > >> >> >>> itamar@code972.com> > >> >> >>> > wrote: > >> >> >>> > > >> >> >>> > > I'm a Lucene.Net committer, and there is a chance we have a > bug > >> >> >>> > > in > >> >> >>> our > >> >> >>> > > FSDirectory implementation that causes indexes to get > corrupted > >> >> >>> > > when > >> >> >>> > > indexing is cut while the IW is still open. As it roots from > >> >> >>> > > some > >> >> >>> > > retroactive fixes you made, I'd appreciate your feedback. > >> >> >>> > > > >> >> >>> > > Correct me if I'm wrong, but by design Lucene should be able > to > >> >> >>> recover > >> >> >>> > > rather quickly from power failures or app crashes. Since > >> >> >>> > > existing > >> >> >>> segment > >> >> >>> > > files are read only, only new segments that are still being > >> >> >>> > > written > >> >> >>> can > >> >> >>> > get > >> >> >>> > > corrupted. Hence, recovering from worst-case scenarios is > done > >> >> >>> > > by > >> >> >>> simply > >> >> >>> > > removing the write.lock file. The worst that could happen > then > >> >> >>> > > is > >> >> >>> having > >> >> >>> > the > >> >> >>> > > last segment damaged, and that can be fixed by removing those > >> >> >>> > > files, > >> >> >>> > > possibly by running CheckIndex on the index. > >> >> >>> > > >> >> >>> > You shouldn't even have to run CheckIndex ... because (as of > >> >> >>> > LUCENE-1044) we now fsync all segment files before writing the > >> >> >>> > new > >> >> >>> > segments_N file, and then removing old segments_N files (and > any > >> >> >>> > segments that are no longer referenced). > >> >> >>> > > >> >> >>> > You do have to remove the write.lock if you aren't using > >> >> >>> > NativeFSLockFactory (but this has been the default lock impl > for > >> >> >>> > a > >> >> >>> > while now). > >> >> >>> > > >> >> >>> > > Last week I have been playing with rather large indexes and > >> >> >>> > > crashed > >> >> >>> my > >> >> >>> > app > >> >> >>> > > while it was indexing. I wasn't able to open the index, and > >> >> >>> > > Luke > >> >> >>> > > was > >> >> >>> even > >> >> >>> > > kind enough to wipe the index folder clean even though I > opened > >> >> >>> > > it > >> >> >>> > > in > >> >> >>> > > read-only mode. I re-ran this, and after another crash > running > >> >> >>> CheckIndex > >> >> >>> > > revealed nothing - the index was detected to be an empty > one. I > >> >> >>> > > am > >> >> >>> not > >> >> >>> > > entirely sure what could be the cause for this, but I suspect > >> >> >>> > > it > >> >> >>> > > has > >> >> >>> > > been corrupted by the crash. > >> >> >>> > > >> >> >>> > Had no commit completed (no segments file written)? > >> >> >>> > > >> >> >>> > If you don't fsync then all sorts of crazy things are > possible... > >> >> >>> > > >> >> >>> > > I've been looking at these: > >> >> >>> > > > >> >> >>> > > > >> >> >>> > > >> >> >>> > >> >> >>> > >> >> >>> > https://issues.apache.org/jira/browse/LUCENE-3418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > >> >> >>> > > > >> >> >>> > > >> >> >>> > >> >> >>> > >> >> >>> > https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > >> >> >>> > > >> >> >>> > (And LUCENE-1044 before that ... it was LUCENE-1044 that > >> >> >>> > LUCENE-2328 > >> >> >>> > broke...). > >> >> >>> > > >> >> >>> > > And it seems like this is what I was experiencing. Mike and > >> >> >>> > > Mark > >> >> >>> > > will > >> >> >>> > > probably be able to tell if this is what they saw or not, but > >> >> >>> > > as > >> >> >>> > > far > >> >> >>> as I > >> >> >>> > > can tell this is not an expected behavior of a Lucene index. > >> >> >>> > > >> >> >>> > Definitely not expected behavior: assuming nothing is flipping > >> >> >>> > bits, > >> >> >>> > then on OS/JVM crash or power loss your index should be fine, > >> >> >>> > just > >> >> >>> > reverted to the last successful commit. > >> >> >>> > > >> >> >>> > > What I'm looking for at the moment is some advice on what > >> >> >>> > > FSDirectory > >> >> >>> > > implementation to use to make sure no corruption can happen. > >> >> >>> > > The > >> >> >>> > > 3.4 > >> >> >>> > version > >> >> >>> > > (which is where LUCENE-3418 was committed to) seems to > handle a > >> >> >>> > > lot > >> >> >>> of > >> >> >>> > > things the 3.0 doesn't, but on the other hand LUCENE-3418was > >> >> >>> introduced > >> >> >>> > by > >> >> >>> > > changes made to the 3.0 codebase. > >> >> >>> > > >> >> >>> > Hopefully it's just that you are missing fsync! > >> >> >>> > > >> >> >>> > > Also, is there any test in the suite checking for those > >> >> >>> > > scenarios? > >> >> >>> > > >> >> >>> > Our test framework has a sneaky MockDirectoryWrapper that, > after > >> >> >>> > a > >> >> >>> > test finishes, goes and corrupts any unsync'd files and then > >> >> >>> > verifies > >> >> >>> > the index is still OK... it's good because it'll catch any > times > >> >> >>> > we > >> >> >>> > are missing calls t sync, but, it's not low level enough such > >> >> >>> > that > >> >> >>> > if > >> >> >>> > FSDir is failing to actually call fsync (that wsa the bug in > >> >> >>> > LUCENE-3418) then it won't catch that... > >> >> >>> > > >> >> >>> > Mike McCandless > >> >> >>> > > >> >> >>> > http://blog.mikemccandless.com > >> >> >>> > > >> >> >>> > >> >> >> > >> >> >> > >> >> > > >> > > >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > >> For additional commands, e-mail: dev-help@lucene.apache.org > >> > >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: dev-help@lucene.apache.org > > > --14dae93410738a892504c2780f86 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
I can confirm 2.9.4 had autoCommit, but it is gone in 3.0.= 3 already, so Lucene.Net doesn't have autoCommit.

So= I don't have autoCommit set to true, but I can clearly see a segments_= 1 file there along with the other files. If that helpes, it always keeps wi= th the name segments_1 with 32 bytes, never changes.

And as again, if I kill the process and try to open the= index with Luke 3.3, the index folder is being wiped out.

Not sure what to make of all that.

On Fri, Jun 15, 2012 at 3:21 AM, Michael McCandless <<= a href=3D"mailto:lucene@mikemccandless.com" target=3D"_blank">lucene@mikemc= candless.com> wrote:
Hmm, OK: in 2.9.4 / 3.0.x, if you open IW on a new directory, it will
make a zero-segment commit. =A0This was changed/fixed in 3.1 with
LUCENE-2386.

In 2.9.x (not 3.0.x) there is still an autoCommit parameter,
defaulting to false, but if you set it to true then IndexWriter will
periodically commit.

Seeing segment files created and merge is definitely expected, but
it's not expected to see segments_N files unless you pass
autoCommit=3Dtrue.

Mike McCandless

http://blog.mi= kemccandless.com

On Thu, Jun 14, 2012 at 8:14 PM= , Itamar Syn-Hershko <itamar@code9= 72.com> wrote:
> Not what I'm seeing. I actually see a lot of segments created and = merged
> while it operates. Expected?
>
> Reminding you, this is 2.9.4 / 3.0.3
>
> On Fri, Jun 15, 2012 at 3:= 10 AM, Michael McCandless
> <lucene@mikemccandless= .com> wrote:
>>
>> Right: Lucene never autocommits anymore ...
>>
>> If you create a new index, add a bunch of docs, and things crash >> before you have a chance to commit, then there is no index (not ev= en a
>> 0 doc one) in that directory.
>>
>> Mike McCandless
>>
>> http:= //blog.mikemccandless.com
>>
>> On Thu, Jun 14, 2012 a= t 1:41 PM, Itamar Syn-Hershko <ita= mar@code972.com>
>> wrote:
>> > I'm quite certain this shouldn't happen also when Com= mit wasn't called.
>> >
>> > Mike, can you comment on that?
>> >
>> > On Thu, Jun 14, 2012<= /a> at 8:03 PM, Christopher Currens
>> > <
currens.chris@= gmail.com> wrote:
>> >>
>> >> Well, the only thing I see is that there is no place wher= e
>> >> writer.Commit()
>> >> is called in the delegate assigned to corpusReader.OnDocu= ment. =A0I know
>> >> that
>> >> lucene is very transactional, and at least in 3.x, the wr= iter will
>> >> never
>> >> auto commit to the index. =A0You can write millions of do= cuments, but if
>> >> commit is never called, those documents aren't actual= ly part of the
>> >> index.
>> >> =A0Committing isn't a cheap operation, so you definit= ely don't want to do
>> >> it
>> >> on every document.
>> >>
>> >> You can test it yourself with this (naive) solution. =A0R= ight below the
>> >> writer.SetUseCompoundFile(false) line, add "int numD= ocsAdded =3D 0;". =A0At
>> >> the
>> >> end of the corpusReader.OnDocument delegate add:
>> >>
>> >> // Example only. =A0I wouldn't suggest committing thi= s often
>> >> if(++numDocsAdded % 5 =3D=3D 0)
>> >> {
>> >> =A0 =A0writer.Commit();
>> >> }
>> >>
>> >> I had the application crash for real on this file:
>> >>
>> >>
>> >> http= ://dumps.wikimedia.org/gawiktionary/20120613/gawiktionary-20120613-pages-me= ta-history.xml.bz2,
>> >> about 20% into the operation. =A0Without the commit, the = index is empty.
>> >> =A0Add
>> >> it in, and I get 755 files in the index after it crashes.=
>> >>
>> >>
>> >> Thanks,
>> >> Christopher
>> >>
>> >> On Wed, Jun 13, 2= 012 at 6:13 PM, Itamar Syn-Hershko
>> >> <itamar@code972.= com>wrote:
>> >>
>> >>
>> >> > Yes, reproduced in first try. See attached program -= I referenced it
>> >> > to
>> >> > current trunk.
>> >> >
>> >> >
>> >> > On Thu, Jun 14, 2012 at 3:54 AM, Itamar Syn-Hershko
>> >> > <itamar@cod= e972.com>wrote:
>> >> >
>> >> >> Christopher,
>> >> >>
>> >> >> I used the IndexBuilder app from here
>> >> >> https://github.com/synhershk= o/Talks/tree/master/LuceneNeatThings
>> >> >> with a
>> >> >> 8.5GB wikipedia dump.
>> >> >>
>> >> >> After running for 2.5 days I had to forcefully c= lose it (infinite
>> >> >> loop
>> >> >> in
>> >> >> the wiki-markdown parser at 92%, go figure), and= the 40-something GB
>> >> >> index
>> >> >> I had by then was unusable. I then was able to r= eproduce this
>> >> >>
>> >> >> Please note I now added a few safe-guards you mi= ght want to remove
>> >> >> to
>> >> >> make sure the app really crashes on process kill= .
>> >> >>
>> >> >> I'll try to come up with a better way to rep= roduce this - hopefully
>> >> >> Mike
>> >> >> will be able to suggest better ways than manual = process kill...
>> >> >>
>> >> >> On Thu, Jun 14, 2012 at 1:41 AM, Christopher Currens <
>> >> >> curre= ns.chris@gmail.com> wrote:
>> >> >>
>> >> >>> Mike, The codebase for lucene.net should be almost identical to
>> >> >>> java's
>> >> >>> 3.0.3 release, and LUCENE-1044 is included i= n that.
>> >> >>>
>> >> >>> Itamar, are you committing the index regular= ly? =A0I only ask because
>> >> >>> I
>> >> >>> can't
>> >> >>> reproduce it myself by forcibly terminating = the process while it's
>> >> >>> indexing. =A0I've tried both 3.0.3 and 2= .9.4. =A0If I don't commit at
>> >> >>> all
>> >> >>> and
>> >> >>> terminate the process (even with a 10,000 4K= documents created),
>> >> >>> there
>> >> >>> will
>> >> >>> be no documents in the index when I open it = in luke, which I
>> >> >>> expect.
>> >> >>> =A0If
>> >> >>> I
>> >> >>> commit at 10,000 documents, and terminate it= a few thousand after
>> >> >>> that,
>> >> >>> the
>> >> >>> index has the first ten thousand that were c= ommitted. =A0I've even
>> >> >>> terminated
>> >> >>> it *while* a second commit was taking place,= and it still had all
>> >> >>> of
>> >> >>> the
>> >> >>> documents I expected.
>> >> >>>
>> >> >>> It may be that I'm not trying to reprodu= cing it correctly. =A0Do you
>> >> >>> have a
>> >> >>> minimal amount of code that can reproduce it= ?
>> >> >>>
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Christopher
>> >> >>>
>> >> >>> On Wed, Jun 13, 2012 at 9:31 AM, Michael McCandless <
>> >> >>> lucene@mikemccandless.com> wrote:
>> >> >>>
>> >> >>> > Hi Itamar,
>> >> >>> >
>> >> >>> > One quick question: does Lucene.Net inc= lude the fixes done for
>> >> >>> > LUCENE-1044 (to fsync files on commit)?= =A0Those are very important
>> >> >>> > for
>> >> >>> > an index to be intact after OS/JVM cras= h or power loss.
>> >> >>> >
>> >> >>> > More responses below:
>> >> >>> >
>> >> >>> > On Tue, Jun 12, 2012 at 8:20 PM, Itamar Syn-Hershko <
>> >> >>> itamar= @code972.com>
>> >> >>> > wrote:
>> >> >>> >
>> >> >>> > > I'm a Lucene.Net committer, an= d there is a chance we have a bug
>> >> >>> > > in
>> >> >>> our
>> >> >>> > > FSDirectory implementation that ca= uses indexes to get corrupted
>> >> >>> > > when
>> >> >>> > > indexing is cut while the IW is st= ill open. As it roots from
>> >> >>> > > some
>> >> >>> > > retroactive fixes you made, I'= d appreciate your feedback.
>> >> >>> > >
>> >> >>> > > Correct me if I'm wrong, but b= y design Lucene should be able to
>> >> >>> recover
>> >> >>> > > rather quickly from power failures= or app crashes. Since
>> >> >>> > > existing
>> >> >>> segment
>> >> >>> > > files are read only, only new segm= ents that are still being
>> >> >>> > > written
>> >> >>> can
>> >> >>> > get
>> >> >>> > > corrupted. Hence, recovering from = worst-case scenarios is done
>> >> >>> > > by
>> >> >>> simply
>> >> >>> > > removing the write.lock file. The = worst that could happen then
>> >> >>> > > is
>> >> >>> having
>> >> >>> > the
>> >> >>> > > last segment damaged, and that can= be fixed by removing those
>> >> >>> > > files,
>> >> >>> > > possibly by running CheckIndex on = the index.
>> >> >>> >
>> >> >>> > You shouldn't even have to run Chec= kIndex ... because (as of
>> >> >>> > LUCENE-1044) we now fsync all segment f= iles before writing the
>> >> >>> > new
>> >> >>> > segments_N file, and then removing old = segments_N files (and any
>> >> >>> > segments that are no longer referenced)= .
>> >> >>> >
>> >> >>> > You do have to remove the write.lock if= you aren't using
>> >> >>> > NativeFSLockFactory (but this has been = the default lock impl for
>> >> >>> > a
>> >> >>> > while now).
>> >> >>> >
>> >> >>> > > Last week I have been playing with= rather large indexes and
>> >> >>> > > crashed
>> >> >>> my
>> >> >>> > app
>> >> >>> > > while it was indexing. I wasn'= t able to open the index, and
>> >> >>> > > Luke
>> >> >>> > > was
>> >> >>> even
>> >> >>> > > kind enough to wipe the index fold= er clean even though I opened
>> >> >>> > > it
>> >> >>> > > in
>> >> >>> > > read-only mode. I re-ran this, and= after another crash running
>> >> >>> CheckIndex
>> >> >>> > > revealed nothing - the index was d= etected to be an empty one. I
>> >> >>> > > am
>> >> >>> not
>> >> >>> > > entirely sure what could be the ca= use for this, but I suspect
>> >> >>> > > it
>> >> >>> > > has
>> >> >>> > > been corrupted by the crash.
>> >> >>> >
>> >> >>> > Had no commit completed (no segments fi= le written)?
>> >> >>> >
>> >> >>> > If you don't fsync then all sorts o= f crazy things are possible...
>> >> >>> >
>> >> >>> > > I've been looking at these: >> >> >>> > >
>> >> >>> > >
>> >> >>> >
>> >> >>>
>> >> >>>
>> >> >>> https://issues.apache.org/jira/browse/LUCENE-3= 418?page=3Dcom.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel=
>> >> >>> > >
>> >> >>> >
>> >> >>>
>> >> >>>
>> >> >>> https://issues.apache.org/jira/browse/LUCENE-2= 328?page=3Dcom.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel=
>> >> >>> >
>> >> >>> > (And LUCENE-1044 before that ... it was= LUCENE-1044 that
>> >> >>> > LUCENE-2328
>> >> >>> > broke...).
>> >> >>> >
>> >> >>> > > And it seems like this is what I w= as experiencing. Mike and
>> >> >>> > > Mark
>> >> >>> > > will
>> >> >>> > > probably be able to tell if this i= s what they saw or not, but
>> >> >>> > > as
>> >> >>> > > far
>> >> >>> as I
>> >> >>> > > can tell this is not an expected b= ehavior of a Lucene index.
>> >> >>> >
>> >> >>> > Definitely not expected behavior: assum= ing nothing is flipping
>> >> >>> > bits,
>> >> >>> > then on OS/JVM crash or power loss your= index should be fine,
>> >> >>> > just
>> >> >>> > reverted to the last successful commit.=
>> >> >>> >
>> >> >>> > > What I'm looking for at the mo= ment is some advice on what
>> >> >>> > > FSDirectory
>> >> >>> > > implementation to use to make sure= no corruption can happen.
>> >> >>> > > The
>> >> >>> > > 3.4
>> >> >>> > version
>> >> >>> > > (which is where LUCENE-3418 was committed to) seems to handle a >> >> >>> > > lot
>> >> >>> of
>> >> >>> > > things the 3.0 doesn't, but on= the other hand LUCENE-3418 was=
>> >> >>> introduced
>> >> >>> > by
>> >> >>> > > changes made to the 3.0 codebase.<= br> >> >> >>> >
>> >> >>> > Hopefully it's just that you are mi= ssing fsync!
>> >> >>> >
>> >> >>> > > Also, is there any test in the sui= te checking for those
>> >> >>> > > scenarios?
>> >> >>> >
>> >> >>> > Our test framework has a sneaky MockDir= ectoryWrapper that, after
>> >> >>> > a
>> >> >>> > test finishes, goes and corrupts any un= sync'd files and then
>> >> >>> > verifies
>> >> >>> > the index is still OK... it's good = because it'll catch any times
>> >> >>> > we
>> >> >>> > are missing calls t sync, but, it's= not low level enough such
>> >> >>> > that
>> >> >>> > if
>> >> >>> > FSDir is failing to actually call fsync= (that wsa the bug in
>> >> >>> > LUCENE-3418) then it won't catch that...
>> >> >>> >
>> >> >>> > Mike McCandless
>> >> >>> >
>> >> >>> > http://blog.mikemccandless.com
>> >> >>> >
>> >> >>>
>> >> >>
>> >> >>
>> >> >
>> >
>> >
>>
>> ------------------------------------------------------------------= ---
>> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: dev-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org



--14dae93410738a892504c2780f86--