lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shane O'Sullivan <shaneosulliv...@gmail.com>
Subject RE: corrupted index
Date Mon, 12 Sep 2005 10:49:05 GMT
I've been looking into whether or not it is possible the check a Lucene 
index for corruption. It doesn't matter how the corruption occurs, from JVM 
crashes, bad file copying or whatever. I found an old thread in this mailing 
list on the subject, which was from before Lucene 1.2, over 3 years ago. In 
this, it was suggested that a corruption-checking tool might be written. 
Does anyone know if anything came of this?

Thanks

Shane 

----------------------------------------------------------------------------------------------------
-----Original Message-----
From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
Sent: Monday, April 02, 2002 11:51:42 GMT
To: lucene-dev@jakarta.apache.org
Cc: matt@jivesoftware.com
Subject: RE: corrupted index
Doug,

Yep, I think waiting until after 1.2 would be a good idea. As I find
time over the next couple of weeks, I'll try to start putting together a
proposal.

A good short-term improvement would be to document the usage of
IOException in the Javadocs and explain when it might occur.

In terms of subclassing IOException -- sounds like it could be a good
approach.

Regards,
Matt

> -----Original Message-----
> From: Doug Cutting [mailto:DCutting@grandcentral.com] 
> Sent: Tuesday, April 02, 2002 11:24 AM
> To: 'Lucene Developers List'
> Subject: RE: corrupted index
> 
> 
> Matt,
> 
> I'd welcome a concrete proposal in this area. Probably we 
> should wait until we have a final 1.2 release out there 
> before making such changes. Note that this could be done 
> compatibly if the new exceptions subclass java.io.IOException.
> 
> Doug
> 
> > -----Original Message-----
> > From: Otis Gospodnetic [mailto:otis_gospodnetic@yahoo.com]
> > Sent: Monday, April 01, 2002 9:06 PM
> > To: lucene-dev@jakarta.apache.org
> > Cc: matt@jivesoftware.com
> > Subject: RE: corrupted index
> > 
> > 
> > I changed the recipient from -user to -dev list, as that seems more 
> > appropriate. I think this would not be a bad idea, if we do 
> it right.
> > Things like IndexLockedException, etc. sound alright to me.
> > I think Doug once welcomed such a change on one of the lists, too.
> > 
> > Perhaps a list of suggested exceptions, new exception classes and 
> > appropriate patches would be the best contribution.
> > 
> > Thanks,
> > Otis
> > 
> > --- Matt Tucker <matt@jivesoftware.com> wrote:
> > > Hey all,
> > > 
> > > Actually, using shutdown hooks might not be the best idea since 
> > > Lucene is very often used in server-side Java environments. Many 
> > > app-servers throw security
> > > errors when trying to add shutdown hooks, and I've seen Weblogic
> > > crash before 
> > > when having them in a webapp. Has anyone else run into this?
> > > 
> > > This all brings up a key issue with Lucene, which is that 
> there is 
> > > little way to recover from errors gracefully. I'd love to see a 
> > > number of checked
> > > exceptions added. For example:
> > > 
> > > IndexNotFoundException -- when trying to open an index 
> that doesn't 
> > > exist IndexLockedException -- when a lock file prevents you from 
> > > getting an index
> > > IndexCorruptException -- maybe this would be thrown when an index
> > > appears to 
> > > be broken?
> > > 
> > > At the moment, Lucene throws many undocumented IOExceptions
> > and even
> > > NullPointerExceptions when an error case comes up. I 
> catch these in 
> > > my app, but there's really not an intelligent way to recover from 
> > > them. Adding checked
> > > exceptions would be a change of the API, but it seems 
> worth it. I'd
> > > be happy to 
> > > make a more specific proposal if other people feel like 
> > this would be
> > > a
> > > worthwhile direction to go in.
> > > 
> > > Regards,
> > > Matt
> > > 
> > > Quoting "Spencer, Dave" <dave@lumos.com>:
> > > 
> > > > Runtime.addShutdownHook:
> > > > 
> > > > 
> > > > 
> > > >
> > >
> > http://java.sun.com/j2se/1.3/docs/api/java/lang/Runtime.html#a
> > ddShutdown
> > > > Hook(java.lang.Thread)
> > > > 
> > > > -----Original Message-----
> > > > From: Otis Gospodnetic [ mailto:otis_gospodnetic@yahoo.com]
> > > > Sent: Sunday, March 17, 2002 12:06 AM
> > > > To: Lucene Users List
> > > > Subject: Re: corrupted index
> > > > 
> > > > 
> > > > Oh, I just thought of something (wine does body good). 
> Perhaps one 
> > > > could use Runtime (the class) to catch the
> > JVM shutdown
> > > and
> > > > do whatever is needed to prevent index corruption. I
> > believe there
> > > are
> > > > some shutdown hook methods in there that may let you do 
> that. I'm
> > > too
> > > > lazy to look up the API docs now, but I rememeber reading about
> > > that
> > > > once, and perhaps it was even mentioned on one of the 2 Lucene
> > > mailing
> > > > lists.
> > > > 
> > > > On the other hand, it would be great to have a tool that
> > can verify
> > > an
> > > > existing index. I don't know enough about the actual file
> > > structure
> > > > yet to write something like that, but maybe somebody 
> else has done
> > > that
> > > > already or would like to contribute.
> > > > 
> > > > Otis
> > > > 
> > > > 
> > > > --- "Steven J. Owens" <puffmail@darksleep.com> wrote:
> > > > > Otis,
> > > > >
> > > > > > You can remove the .lock file and try re-indexing or
> > continuing
> > > > > > indexing where you left off.
> > > > > > I am not sure about the corrupt index. I have never seen it
> > > > > happen,
> > > > > > and I believe I recall reading some messages from 
> Doug Cutting
> > > > > saying
> > > > > > that index should never be left in an inconsistent state.
> > > > >
> > > > > Obviously never "should" be, but if something's 
> pulling the
> > > rug
> > > > > out from under his JRE, changes could be only 
> partially written, 
> > > > > right?
> > > > >
> > > > > Or is the writing format in some sense
> > transactionally safe?
> > > > > I've never worked directly on something like this, 
> but I worked
> > > at a
> > > > > database software company where they used transaction 
> semantics
> > > and a
> > > > > journaling scheme to fake a "bulletproof" file 
> system. Is this
> > > how
> > > > > the index-writing code is implemented?
> > > > >
> > > > > In general, I can guess Doug's response - just
> > torch the old
> > > > > index directory and rebuild it; Lucene's indexing is 
> fast enough
> > > that
> > > > > you don't need to get clever. This seems to be 
> Doug's stance in 
> > > > > general (i.e. "don't get fancy, I already put all the 
> fanciness 
> > > > > you'll need into extremely fast indexing and searching"). So 
> > > > > far, it
> > > seems
> > > > > to work :-).
> > > > >
> > > > > > I could be making this up, though, so I suggest you search
> > > through
> > > > > > lucene-user and lucene-dev archives on 
> www.mail-archive.com <http://www.mail-archive.com>. A 
> > > > > > search for "corrupt" should do it. Once you figure 
> things out 
> > > > > > maybe you can post a summary here.
> > > > >
> > > > > I got a little curious, so I went and did the searches.
> > > There
> > > > > is
> > > > > exactly one message in each list archive (dev and
> > users) with the
> > > > > keyword "corrupt" in it. The lucene-users instance is
> > > irrelevant:
> > > > >
> > > > >
> > > >
> > >
> > http://www.mail-archive.com/lucene-user@jakarta.apache.org/msg
> 00557.html
> > > >
> > > > The lucene-dev instance is more useful:
> > > >
> > > >
> > >
> >
> http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg0
0157.html
> > >
> > > It's a post from Doug, dated sept 27, 2001, about adding not 
> > > just thread-safety but process-safety:
> > >
> > > It should be impossible to corrupt an index through the Lucene
> API.
> > > However if a Lucene process exits unexpectedly it can leave the 
> > > index
> > > locked. The remedy is simply to, at a time when it is certain
> that
> > > no
> > > processes are accessing the index, remove all lock files.
> > > 
> > > So it sounds like it's worth trying just removing the lock 
> > > files. Hm, is there a way to come up with a "sanity check" you can

> > > run
> on an
> > > index to make sure it's not corrupted? This might be an
> excellent
> > > thing to reassure yourself with: something went wrong? Run a
> sanity
> > > check, if it fails just reindex.
> > >
> > > Steven J. Owens
> > > puff@darksleep.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message