lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopikrishnan Subramani" <gopi.subram...@gmail.com>
Subject Re: index corruption with latest lucene
Date Tue, 06 May 2008 08:56:18 GMT
[ Sorry if I'm hijacking this thread, if you feel this error is unrelated to
this thread, I'll move this to a separate thread. ]

Even after upgrading to 2.3.1 I'm running into index corruption problems.
I'm posting below the exception that is generated while searching. The stack
trace looks like,


org.apache.lucene.index.CorruptIndexException: doc counts differ for segment
_kk: fieldsReader shows 72670 but segmentInfo shows 72671
        at
org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:313)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:230)
        at
org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:73)
        at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636)
        at
org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
        at
org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:48)


About the setup: all documents in the these indexes have the same set of
fields, but some fields are not added if the value is null. We have over 500
indexes and they are indexed incrementally on a daily basis. The index is
updated in place with autocommit turned on. A single thread writes to the
index and once all the documents are updated, the index is commited. About 5
indexes are getting corrupted per week on an average and a full index fixes
the problem. This is proving to be a lot of pain and any help in identifying
the problem is much appreicated.

thanks,
Gopi

On 5/6/08, Mark Miller <markrmiller@gmail.com> wrote:
>
> I am getting even more confused. I luckily found a copy of one of the
> corrupted test indices that i had made on 4/28/08...lucky as its the
> only one I have ever made :) It doesn't have the problem. This is very
> interesting to me, because the other site that has the problem has been
> in action for months now. Both were running with my previous version of
> Lucene, which was a trunk build from around when 2.3 was released I
> think. Just seems odd that the test index was corrupted so recently.
>
>
> So I am a bit stuck...its probably my own problem though, so unless
> someone else sees it, Ill just report back if/when I find out more.
>
> - Mark
>
>
> On Mon, 2008-05-05 at 18:07 -0400, Michael McCandless wrote:
> > Mark,
> >
> > Which exact version of the JRE are you using?
> >
> > Mike
> >
> > Mark Miller wrote:
> > > On Mon, 2008-05-05 at 17:26 -0400, Michael McCandless wrote:
> > >> Actually that stack trace looks like it's from trunk, not from 2.3.2
> > >> (pre)?  OK, I think you said it's from "post 2.3 trunk".
> > >
> > > Right...the Lucene that showed the problem was build from a trunk grab
> > > late last week. One of the problem indexes was built with a 2.0 or 2.1
> > > and the other was built with a post 2.3 trunk (but weeks (prob months)
> > > before the one i grabbed late last week :) )
> > >
> > >>
> > >> Another question: is autoCommit false or true?
> > > false
> > >
> > >
> > >
> > > If I can get you an affected index I will.
> > >
> > > - mark
> > >
> > >
> > >>
> > >> More responses below:
> > >>
> > >> Mark Miller wrote:
> > >>> On Mon, 2008-05-05 at 16:32 -0400, Michael McCandless wrote:
> > >>>> Hi Mark,
> > >>>>
> > >>>> Not good!
> > >>>>
> > >>>> Can you describe how this index was created?  Did you use multiple
> > >>>> threads on one IndexWriter?  Multiple sessions of IndexWriter
> > >>>> appending to the index?  addIndexes*?  Is the index copied from
one
> > >>>> place to another after being written and before being searched?
> > >>>
> > >>> Both sites were created by a single thread on a single IndexWriter.
> > >>> Updates are done through multiple threads and one IndexWriter. No
> > >>> addIndexes. Index was never copied, always same path.
> > >>>
> > >>>>
> > >>>> If you run CheckIndex, what does it report?
> > >>>
> > >>> This was my next move...unfortunately, someone accidentally kicked
> > >>> off a
> > >>> complete reindex before I could do it. From what I can tell by the
> > >>> stack
> > >>> trace, its a per doc problem...I am guessing I could have
> > >>> printed the
> > >>> ids of the problem docs and just reindex those? I have to deal with
> > >>> this
> > >>> at many other sites, so that may be my attack...I cannot reindex
> > >>> everything to fix.
> > >>
> > >> It would be great to know if that workaround works (and indeed it's a
> > >> per-doc issue).  I'd also love to know how many docs are affected,
> > >> when you hit this.
> > >>
> > >> If there's any way to zip up the index and send it to me, even just
> > >> the files for the one segment that has the corrupted doc, that'd be
> > >> great.
> > >>
> > >>>>
> > >>>> Any prior exceptions on this index?
> > >>>
> > >>> Not that I can recall. One of the indexes was made months ago, prob
> > >>> with
> > >>> a 2.0 or 2.1 Lucene, the second was made with a post 2.2 Lucene. One
> > >>> site was windows 2003, the other AIX. One site was only 30,000
> > >>> docs, the
> > >>> other over 1 million.
> > >>>
> > >>>>
> > >>>> Are your docs a variable schema (different fields)?
> > >>>
> > >>> Yes. Lots of different fields depending on the doc.
> > >>>
> > >>>>
> > >>>> Mike
> > >>>
> > >>> Thanks Mike. I am currently trying to duplicate this. I can't go to
> > >>> another site without testing some kind of fix.
> > >>>
> > >>>>
> > >>>> Mark Miller wrote:
> > >>>>> Yeah, its pretty close to 2.3.2, but I think from last week
mabye.
> > >>>>>
> > >>>>> I finally have one of the stack traces (this comes on the tail
> > >>>>> complete
> > >>>>> laptop failure so I am scrambling here)
> > >>>>>
> > >>>>> java.lang.IndexOutOfBoundsException: Index: 97, Size: 43
> > >>>>>         at java.util.ArrayList.RangeCheck(ArrayList.java:572)
> > >>>>>         at java.util.ArrayList.get(ArrayList.java:347)
> > >>>>>         at org.apache.lucene.index.FieldInfos.fieldInfo
> > >>>>> (FieldInfos.java:260)
> > >>>>>         at org.apache.lucene.index.FieldsReader.doc
> > >>>>> (FieldsReader.java:184)
> > >>>>>         at org.apache.lucene.index.SegmentReader.document
> > >>>>> (SegmentReader.java:670)
> > >>>>>         at org.apache.lucene.index.MultiSegmentReader.document
> > >>>>> (MultiSegmentReader.java:257)
> > >>>>>         at org.apache.lucene.search.IndexSearcher.doc
> > >>>>> (IndexSearcher.java:97)
> > >>>>>
> > >>>>> On Mon, 2008-05-05 at 14:48 -0500, crspan wrote:
> > >>>>>> coincidence or it is from 2.3.2 ?
> > >>>>>>
> > >>>>>> env:
> > >>>>>> lucene 2.3.2
> > >>>>>> jdk1.6.0_06 & jdk1.5.0_15
> > >>>>>>
> > >>>>>>
> > >>>>>> QueryString:
> > >>>>>> illeg^30.820824 technolog^22.290413 transfer^33.307804
> > >>>>>> Error: java.lang.ArrayIndexOutOfBoundsException:
> > >>>>>> 132704java.lang.ArrayIndexOutOfBoundsException: 132704
> > >>>>>> at
> > >>>>>> org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor
> > >>>>>> (BooleanScorer2.java:55)
> > >>>>>> at org.apache.lucene.search.BooleanScorer2.score
> > >>>>>> (BooleanScorer2.java:358)
> > >>>>>> at org.apache.lucene.search.BooleanScorer2.score
> > >>>>>> (BooleanScorer2.java:320)
> > >>>>>> at org.apache.lucene.search.IndexSearcher.search
> > >>>>>> (IndexSearcher.java:146)
> > >>>>>> at org.apache.lucene.search.IndexSearcher.search
> > >>>>>> (IndexSearcher.java:113)
> > >>>>>> at org.apache.lucene.search.Searcher.search(Searcher.java:132)
> > >>>>>> at
> > >>>>>> org.cr.search.TrecQueryRelevanceFeedback.main
> > >>>>>> (TrecQueryRelevanceFeedback.java:776)
> > >>>>>>
> > >>>>>>
> > >>>>>> QueryString:
> > >>>>>> oceanograph^68.48028 vessel^43.191563
> > >>>>>> Error:
> > >>>>>> java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOf
> > >>>>>> Bo
> > >>>>>> un
> > >>>>>> dsException
> > >>>>>> at java.lang.System.arraycopy(Native Method)
> > >>>>>> at
> > >>>>>> org.apache.lucene.index.TermVectorsReader.readTermVector
> > >>>>>> (TermVectorsReader.java:353)
> > >>>>>> at
> > >>>>>> org.apache.lucene.index.TermVectorsReader.readTermVectors
> > >>>>>> (TermVectorsReader.java:287)
> > >>>>>> at org.apache.lucene.index.TermVectorsReader.get
> > >>>>>> (TermVectorsReader.java:232)
> > >>>>>> at
> > >>>>>> org.apache.lucene.index.SegmentReader.getTermFreqVectors
> > >>>>>> (SegmentReader.java:981)
> > >>>>>> at org.cr.rf.RelevanceFeedback.RelFeedbackWeight
> > >>>>>> (RelevanceFeedback.java:134)
> > >>>>>> at
> > >>>>>> org.cr.search.TrecQueryRelevanceFeedback.main
> > >>>>>> (TrecQueryRelevanceFeedback.java:781)
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Mark Miller wrote:
> > >>>>>>> Any recent changes that would expose index corruption?
> > >>>>>>>
> > >>>>>>> I am getting two new errors when trying to search:
> > >>>>>>>
> > >>>>>>> nullpointer fieldsreaders line 260
> > >>>>>>>
> > >>>>>>> indexoutofbounds on fieldinfo line 185
> > >>>>>>>
> > >>>>>>> I am kind of screwed, because reindexing fixes this,
but I cant
> > >>>>>>> reindex!
> > >>>>>>>
> > >>>>>>> Any ideas?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> ----------------------------------------------------------------
> > >>>>>>> --
> > >>>>>>> --
> > >>>>>>> -
> > >>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >>>>>>> For additional commands, e-mail: java-user-
> > >>>>>>> help@lucene.apache.org
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> -----------------------------------------------------------------
> > >>>>>> --
> > >>>>>> --
> > >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>> ------------------------------------------------------------------
> > >>>>> --
> > >>>>> -
> > >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>>>>
> > >>>>
> > >>>>
> > >>>> -------------------------------------------------------------------
> > >>>> --
> > >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>>>
> > >>>
> > >>>
> > >>> --------------------------------------------------------------------
> > >>> -
> > >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>>
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message