lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <>
Subject Re: restoring a corrupt index?
Date Sat, 10 Nov 2007 22:22:56 GMT
On Nov 10, 2007 5:01 PM, Michael McCandless <> wrote:
> "Yonik Seeley" <> wrote:
> > > How can this lead to index corruption?  The "no such file or directory" on
> > > loading _cf9.fnm sounds like index corruption?
> >
> > I don't think older versions of lucene handled these errors as well.
> > Perhaps _cf9.fnm failed to be written, but the segments file succeeded.
> > It could also be Solr's fault for allowing further operations on an
> > index after one failed?  I'm not sure how that should be handled.
> Ahh, OK.
> Yeah it's not clear how Solr should handle this case, though Lucene
> should be at (get to?) the point where on exception no harm to the
> index can be done.  Ie, index on disk is left consistent, and, the
> state of the writer is such that it can't corrupt the index even if
> it's further used after the exception.

Right... keep in mind concurrent updates too... it's pretty much
impossible for the user-level to avoid other calls to addDocument()
from going ahead since they can be in different threads.

> OK, though, writer doesn't use many descriptors at all, right?

Yeah, but other things might be usng up the descriptors... there is
probably an open searcher for searching, possibly another one warming
in the background,   plus whatever else is going on in the servlet

Ryan, are you using stock solr, or embedded?  Anything that might
prevent old searchers from being closed (you chould check the log
files for #opens vs #closes)

>  It
> opens 1 segment's worth to flush, and mergeFactor+1 segment's worth to
> merge.  It's spooky if Lucene can create zillions of un-referenced
> files.

Perhaps unreferenced files may not be deleted if an exception is
encountered first, or perhaps even the deleter is failing due to lack
of descriptors.

> Would be good to get to the root cause & make sure it's really
> fixed on trunk.

Is there a tool we could have ryan point at the segments file and get
a dump of the referenced segments?

> Or, maybe they are all referenced -- I think there were issues with
> the old merge policy that could cause segments to not be merged when
> they should have been?

That should not have been the case in Lucene 2.2 I think.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message