lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: index corruption with latest lucene
Date Mon, 05 May 2008 21:35:01 GMT
On Mon, 2008-05-05 at 17:26 -0400, Michael McCandless wrote:
> Actually that stack trace looks like it's from trunk, not from 2.3.2 
> (pre)?  OK, I think you said it's from "post 2.3 trunk".

Right...the Lucene that showed the problem was build from a trunk grab
late last week. One of the problem indexes was built with a 2.0 or 2.1
and the other was built with a post 2.3 trunk (but weeks (prob months)
before the one i grabbed late last week :) )

> 
> Another question: is autoCommit false or true?
false



If I can get you an affected index I will.

- mark


> 
> More responses below:
> 
> Mark Miller wrote:
> > On Mon, 2008-05-05 at 16:32 -0400, Michael McCandless wrote:
> >> Hi Mark,
> >>
> >> Not good!
> >>
> >> Can you describe how this index was created?  Did you use multiple
> >> threads on one IndexWriter?  Multiple sessions of IndexWriter
> >> appending to the index?  addIndexes*?  Is the index copied from one
> >> place to another after being written and before being searched?
> >
> > Both sites were created by a single thread on a single IndexWriter.
> > Updates are done through multiple threads and one IndexWriter. No
> > addIndexes. Index was never copied, always same path.
> >
> >>
> >> If you run CheckIndex, what does it report?
> >
> > This was my next move...unfortunately, someone accidentally kicked  
> > off a
> > complete reindex before I could do it. From what I can tell by the  
> > stack
> > trace, its a per doc problem...I am guessing I could have  printed the
> > ids of the problem docs and just reindex those? I have to deal with  
> > this
> > at many other sites, so that may be my attack...I cannot reindex
> > everything to fix.
> 
> It would be great to know if that workaround works (and indeed it's a  
> per-doc issue).  I'd also love to know how many docs are affected,  
> when you hit this.
> 
> If there's any way to zip up the index and send it to me, even just  
> the files for the one segment that has the corrupted doc, that'd be  
> great.
> 
> >>
> >> Any prior exceptions on this index?
> >
> > Not that I can recall. One of the indexes was made months ago, prob  
> > with
> > a 2.0 or 2.1 Lucene, the second was made with a post 2.2 Lucene. One
> > site was windows 2003, the other AIX. One site was only 30,000  
> > docs, the
> > other over 1 million.
> >
> >>
> >> Are your docs a variable schema (different fields)?
> >
> > Yes. Lots of different fields depending on the doc.
> >
> >>
> >> Mike
> >
> > Thanks Mike. I am currently trying to duplicate this. I can't go to
> > another site without testing some kind of fix.
> >
> >>
> >> Mark Miller wrote:
> >>> Yeah, its pretty close to 2.3.2, but I think from last week mabye.
> >>>
> >>> I finally have one of the stack traces (this comes on the tail
> >>> complete
> >>> laptop failure so I am scrambling here)
> >>>
> >>> java.lang.IndexOutOfBoundsException: Index: 97, Size: 43
> >>>         at java.util.ArrayList.RangeCheck(ArrayList.java:572)
> >>>         at java.util.ArrayList.get(ArrayList.java:347)
> >>>         at org.apache.lucene.index.FieldInfos.fieldInfo
> >>> (FieldInfos.java:260)
> >>>         at org.apache.lucene.index.FieldsReader.doc
> >>> (FieldsReader.java:184)
> >>>         at org.apache.lucene.index.SegmentReader.document
> >>> (SegmentReader.java:670)
> >>>         at org.apache.lucene.index.MultiSegmentReader.document
> >>> (MultiSegmentReader.java:257)
> >>>         at org.apache.lucene.search.IndexSearcher.doc
> >>> (IndexSearcher.java:97)
> >>>
> >>> On Mon, 2008-05-05 at 14:48 -0500, crspan wrote:
> >>>> coincidence or it is from 2.3.2 ?
> >>>>
> >>>> env:
> >>>> lucene 2.3.2
> >>>> jdk1.6.0_06 & jdk1.5.0_15
> >>>>
> >>>>
> >>>> QueryString:
> >>>> illeg^30.820824 technolog^22.290413 transfer^33.307804
> >>>> Error: java.lang.ArrayIndexOutOfBoundsException:
> >>>> 132704java.lang.ArrayIndexOutOfBoundsException: 132704
> >>>> at
> >>>> org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor
> >>>> (BooleanScorer2.java:55)
> >>>> at org.apache.lucene.search.BooleanScorer2.score
> >>>> (BooleanScorer2.java:358)
> >>>> at org.apache.lucene.search.BooleanScorer2.score
> >>>> (BooleanScorer2.java:320)
> >>>> at org.apache.lucene.search.IndexSearcher.search
> >>>> (IndexSearcher.java:146)
> >>>> at org.apache.lucene.search.IndexSearcher.search
> >>>> (IndexSearcher.java:113)
> >>>> at org.apache.lucene.search.Searcher.search(Searcher.java:132)
> >>>> at
> >>>> org.cr.search.TrecQueryRelevanceFeedback.main
> >>>> (TrecQueryRelevanceFeedback.java:776)
> >>>>
> >>>>
> >>>> QueryString:
> >>>> oceanograph^68.48028 vessel^43.191563
> >>>> Error:
> >>>> java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBo

> >>>> un
> >>>> dsException
> >>>> at java.lang.System.arraycopy(Native Method)
> >>>> at
> >>>> org.apache.lucene.index.TermVectorsReader.readTermVector
> >>>> (TermVectorsReader.java:353)
> >>>> at
> >>>> org.apache.lucene.index.TermVectorsReader.readTermVectors
> >>>> (TermVectorsReader.java:287)
> >>>> at org.apache.lucene.index.TermVectorsReader.get
> >>>> (TermVectorsReader.java:232)
> >>>> at
> >>>> org.apache.lucene.index.SegmentReader.getTermFreqVectors
> >>>> (SegmentReader.java:981)
> >>>> at org.cr.rf.RelevanceFeedback.RelFeedbackWeight
> >>>> (RelevanceFeedback.java:134)
> >>>> at
> >>>> org.cr.search.TrecQueryRelevanceFeedback.main
> >>>> (TrecQueryRelevanceFeedback.java:781)
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> Mark Miller wrote:
> >>>>> Any recent changes that would expose index corruption?
> >>>>>
> >>>>> I am getting two new errors when trying to search:
> >>>>>
> >>>>> nullpointer fieldsreaders line 260
> >>>>>
> >>>>> indexoutofbounds on fieldinfo line 185
> >>>>>
> >>>>> I am kind of screwed, because reindexing fixes this, but I cant
> >>>>> reindex!
> >>>>>
> >>>>> Any ideas?
> >>>>>
> >>>>>
> >>>>> ------------------------------------------------------------------

> >>>>> --
> >>>>> -
> >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> -------------------------------------------------------------------

> >>>> --
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>
> >>>
> >>> -------------------------------------------------------------------- 
> >>> -
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message