lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <>
Subject Re: restoring a corrupt index?
Date Sun, 11 Nov 2007 18:32:02 GMT
On Nov 11, 2007 12:48 PM, Ryan McKinley <> wrote:
> > Ryan are you able to update to that commit I just did?  If so I think
> > you should run the tool without -fix and post back what it printed.  It
> > should report an error on that one segment due to the missing file.
> > Then, run -fix to remove that segment (please backup your index first!).
> > Then, if you have a zillion segments in the index, try optimizing it?
> >
> uggg.  Thanks for all your help / pointers.  I've been able to salvage a
> functioning index - (with 800K fewer docs) - I'll just go back to an old
> index and build from there.
> Optimizing reduced 180K files to 1800 - I'm guessing that had something
> to do with why it hit a max open file limit.

It shouldn't.  A single lucene segment should have about 8 or 9 files
associated with it, so I don't know what the remaining 1800 are.  An
index (with normal settings) should *never* get to 180K files that are
actually part of the index.

> I'm running a standard solr-trunk from a few weeks back (nothing custom,
> no 'embedded' stuff).  The index is constantly updating a few files, so
> I am not surprised by the mass of unused files optimize cleans up.

All files that look like segment files are cleaned up when they are
not referenced now.
The fact that optimize cleaned them up doesn't mean they were in use before.

> If I understand Yonik's comments, something in lucene 2.3 has changed so
> that hitting the max open file limit could not result in a corrupt
> index.  Is that true?

I think Lucene has become more resistant to index corruption on a
failure (like an exception).
Just make sure you do a commit once in a while (old versions of lucene
flushed a new segment and it was automatically added into the current
view of the index... the newest version of lucene with
lucene_autocommit=off won't do that).

> In the meantime would optimizing more often help
> avoid this?

I think the root cause is more running out of descriptors.  It may be
that you just need a higher limit.
If you are using Linux, see this link for how to check/set the number
of system-wide descriptors:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message