lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless" <>
Subject Re: restoring a corrupt index?
Date Sun, 11 Nov 2007 18:59:48 GMT

"Yonik Seeley" <> wrote:
> On Nov 11, 2007 12:48 PM, Ryan McKinley <> wrote:
> > > Ryan are you able to update to that commit I just did?  If so I think
> > > you should run the tool without -fix and post back what it printed.  It
> > > should report an error on that one segment due to the missing file.
> > > Then, run -fix to remove that segment (please backup your index first!).
> > > Then, if you have a zillion segments in the index, try optimizing it?
> > >
> >
> > uggg.  Thanks for all your help / pointers.  I've been able to salvage a
> > functioning index - (with 800K fewer docs) - I'll just go back to an old
> > index and build from there.
> >
> > Optimizing reduced 180K files to 1800 - I'm guessing that had something
> > to do with why it hit a max open file limit.

> It shouldn't.  A single lucene segment should have about 8 or 9 files
> associated with it, so I don't know what the remaining 1800 are.  An
> index (with normal settings) should *never* get to 180K files that are
> actually part of the index.

Ryan can you post the output of CheckIndex on your now-working index?
(1800 is still too many files I think, certainly after having

Also, what steps finally allowed you to recover?  CheckIndex
(back-ported to 2.2) followed by optimize?

> > If I understand Yonik's comments, something in lucene 2.3 has changed so
> > that hitting the max open file limit could not result in a corrupt
> > index.  Is that true?
> I think Lucene has become more resistant to index corruption on a
> failure (like an exception).
> Just make sure you do a commit once in a while (old versions of lucene
> flushed a new segment and it was automatically added into the current
> view of the index... the newest version of lucene with
> lucene_autocommit=off won't do that).

I'm still baffled how Lucene 2.2 could ever produce a corrupt index
even on hitting descriptor limits or other exceptions.  I can see that
this could cause files to not be deleted properly, but, I can't see
how it can corrupt the index.

Ryan can you share any details of how you (Solr) is using Lucene?  Are
you using autoCommit=false?  I'd really love to get to the root cause


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message