lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: index corruption with latest lucene
Date Mon, 05 May 2008 21:00:46 GMT
On Mon, 2008-05-05 at 16:32 -0400, Michael McCandless wrote:
> Hi Mark,
> 
> Not good!
> 
> Can you describe how this index was created?  Did you use multiple  
> threads on one IndexWriter?  Multiple sessions of IndexWriter  
> appending to the index?  addIndexes*?  Is the index copied from one  
> place to another after being written and before being searched?

Both sites were created by a single thread on a single IndexWriter.
Updates are done through multiple threads and one IndexWriter. No
addIndexes. Index was never copied, always same path.

> 
> If you run CheckIndex, what does it report?

This was my next move...unfortunately, someone accidentally kicked off a
complete reindex before I could do it. From what I can tell by the stack
trace, its a per doc problem...I am guessing I could have  printed the
ids of the problem docs and just reindex those? I have to deal with this
at many other sites, so that may be my attack...I cannot reindex
everything to fix.

> 
> Any prior exceptions on this index?

Not that I can recall. One of the indexes was made months ago, prob with
a 2.0 or 2.1 Lucene, the second was made with a post 2.2 Lucene. One
site was windows 2003, the other AIX. One site was only 30,000 docs, the
other over 1 million.

> 
> Are your docs a variable schema (different fields)?

Yes. Lots of different fields depending on the doc.

> 
> Mike

Thanks Mike. I am currently trying to duplicate this. I can't go to
another site without testing some kind of fix.

> 
> Mark Miller wrote:
> > Yeah, its pretty close to 2.3.2, but I think from last week mabye.
> >
> > I finally have one of the stack traces (this comes on the tail  
> > complete
> > laptop failure so I am scrambling here)
> >
> > java.lang.IndexOutOfBoundsException: Index: 97, Size: 43
> >         at java.util.ArrayList.RangeCheck(ArrayList.java:572)
> >         at java.util.ArrayList.get(ArrayList.java:347)
> >         at org.apache.lucene.index.FieldInfos.fieldInfo 
> > (FieldInfos.java:260)
> >         at org.apache.lucene.index.FieldsReader.doc 
> > (FieldsReader.java:184)
> >         at org.apache.lucene.index.SegmentReader.document 
> > (SegmentReader.java:670)
> >         at org.apache.lucene.index.MultiSegmentReader.document 
> > (MultiSegmentReader.java:257)
> >         at org.apache.lucene.search.IndexSearcher.doc 
> > (IndexSearcher.java:97)
> >
> > On Mon, 2008-05-05 at 14:48 -0500, crspan wrote:
> >> coincidence or it is from 2.3.2 ?
> >>
> >> env:
> >> lucene 2.3.2
> >> jdk1.6.0_06 & jdk1.5.0_15
> >>
> >>
> >> QueryString:
> >> illeg^30.820824 technolog^22.290413 transfer^33.307804
> >> Error: java.lang.ArrayIndexOutOfBoundsException:
> >> 132704java.lang.ArrayIndexOutOfBoundsException: 132704
> >> at
> >> org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor 
> >> (BooleanScorer2.java:55)
> >> at org.apache.lucene.search.BooleanScorer2.score 
> >> (BooleanScorer2.java:358)
> >> at org.apache.lucene.search.BooleanScorer2.score 
> >> (BooleanScorer2.java:320)
> >> at org.apache.lucene.search.IndexSearcher.search 
> >> (IndexSearcher.java:146)
> >> at org.apache.lucene.search.IndexSearcher.search 
> >> (IndexSearcher.java:113)
> >> at org.apache.lucene.search.Searcher.search(Searcher.java:132)
> >> at
> >> org.cr.search.TrecQueryRelevanceFeedback.main 
> >> (TrecQueryRelevanceFeedback.java:776)
> >>
> >>
> >> QueryString:
> >> oceanograph^68.48028 vessel^43.191563
> >> Error:
> >> java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBoun 
> >> dsException
> >> at java.lang.System.arraycopy(Native Method)
> >> at
> >> org.apache.lucene.index.TermVectorsReader.readTermVector 
> >> (TermVectorsReader.java:353)
> >> at
> >> org.apache.lucene.index.TermVectorsReader.readTermVectors 
> >> (TermVectorsReader.java:287)
> >> at org.apache.lucene.index.TermVectorsReader.get 
> >> (TermVectorsReader.java:232)
> >> at
> >> org.apache.lucene.index.SegmentReader.getTermFreqVectors 
> >> (SegmentReader.java:981)
> >> at org.cr.rf.RelevanceFeedback.RelFeedbackWeight 
> >> (RelevanceFeedback.java:134)
> >> at
> >> org.cr.search.TrecQueryRelevanceFeedback.main 
> >> (TrecQueryRelevanceFeedback.java:781)
> >>
> >>
> >>
> >>
> >> Mark Miller wrote:
> >>> Any recent changes that would expose index corruption?
> >>>
> >>> I am getting two new errors when trying to search:
> >>>
> >>> nullpointer fieldsreaders line 260
> >>>
> >>> indexoutofbounds on fieldinfo line 185
> >>>
> >>> I am kind of screwed, because reindexing fixes this, but I cant  
> >>> reindex!
> >>>
> >>> Any ideas?
> >>>
> >>>
> >>> -------------------------------------------------------------------- 
> >>> -
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message