lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: index corruption with latest lucene
Date Mon, 05 May 2008 22:02:44 GMT
Man, I have even confused myself on these versions at this point. Let me
start over.

I am having the problem with a version of lucene that was the trunk late
last week. Which pretty much means 2.3.2.

I'd hate to hold up the release if the problem was only me though. I am
trying to work through it as fast I can. I just have to find another
index somewhere with the problem. Its just difficult because the indexes
are very large and on remote live sites. I am hoping I can find another
old test one with the problem or make one. The two installs that I have
detected the problem were rebuilt, one inadvertently.

- Mark

On Mon, 2008-05-05 at 14:32 -0700, Michael Busch wrote:
> If that is the case then I will go ahead and publish the 2.3.2 release? 
> Have you seen this on 2.3.x, Mark?
> 
> -Michael
> 
> Michael McCandless wrote:
> > 
> > Actually that stack trace looks like it's from trunk, not from 
> > 2.3.2(pre)?  OK, I think you said it's from "post 2.3 trunk".
> > 
> > Another question: is autoCommit false or true?
> > 
> > More responses below:
> > 
> > Mark Miller wrote:
> >> On Mon, 2008-05-05 at 16:32 -0400, Michael McCandless wrote:
> >>> Hi Mark,
> >>>
> >>> Not good!
> >>>
> >>> Can you describe how this index was created?  Did you use multiple
> >>> threads on one IndexWriter?  Multiple sessions of IndexWriter
> >>> appending to the index?  addIndexes*?  Is the index copied from one
> >>> place to another after being written and before being searched?
> >>
> >> Both sites were created by a single thread on a single IndexWriter.
> >> Updates are done through multiple threads and one IndexWriter. No
> >> addIndexes. Index was never copied, always same path.
> >>
> >>>
> >>> If you run CheckIndex, what does it report?
> >>
> >> This was my next move...unfortunately, someone accidentally kicked off a
> >> complete reindex before I could do it. From what I can tell by the stack
> >> trace, its a per doc problem...I am guessing I could have  printed the
> >> ids of the problem docs and just reindex those? I have to deal with this
> >> at many other sites, so that may be my attack...I cannot reindex
> >> everything to fix.
> > 
> > It would be great to know if that workaround works (and indeed it's a 
> > per-doc issue).  I'd also love to know how many docs are affected, when 
> > you hit this.
> > 
> > If there's any way to zip up the index and send it to me, even just the 
> > files for the one segment that has the corrupted doc, that'd be great.
> > 
> >>>
> >>> Any prior exceptions on this index?
> >>
> >> Not that I can recall. One of the indexes was made months ago, prob with
> >> a 2.0 or 2.1 Lucene, the second was made with a post 2.2 Lucene. One
> >> site was windows 2003, the other AIX. One site was only 30,000 docs, the
> >> other over 1 million.
> >>
> >>>
> >>> Are your docs a variable schema (different fields)?
> >>
> >> Yes. Lots of different fields depending on the doc.
> >>
> >>>
> >>> Mike
> >>
> >> Thanks Mike. I am currently trying to duplicate this. I can't go to
> >> another site without testing some kind of fix.
> >>
> >>>
> >>> Mark Miller wrote:
> >>>> Yeah, its pretty close to 2.3.2, but I think from last week mabye.
> >>>>
> >>>> I finally have one of the stack traces (this comes on the tail
> >>>> complete
> >>>> laptop failure so I am scrambling here)
> >>>>
> >>>> java.lang.IndexOutOfBoundsException: Index: 97, Size: 43
> >>>>         at java.util.ArrayList.RangeCheck(ArrayList.java:572)
> >>>>         at java.util.ArrayList.get(ArrayList.java:347)
> >>>>         at org.apache.lucene.index.FieldInfos.fieldInfo
> >>>> (FieldInfos.java:260)
> >>>>         at org.apache.lucene.index.FieldsReader.doc
> >>>> (FieldsReader.java:184)
> >>>>         at org.apache.lucene.index.SegmentReader.document
> >>>> (SegmentReader.java:670)
> >>>>         at org.apache.lucene.index.MultiSegmentReader.document
> >>>> (MultiSegmentReader.java:257)
> >>>>         at org.apache.lucene.search.IndexSearcher.doc
> >>>> (IndexSearcher.java:97)
> >>>>
> >>>> On Mon, 2008-05-05 at 14:48 -0500, crspan wrote:
> >>>>> coincidence or it is from 2.3.2 ?
> >>>>>
> >>>>> env:
> >>>>> lucene 2.3.2
> >>>>> jdk1.6.0_06 & jdk1.5.0_15
> >>>>>
> >>>>>
> >>>>> QueryString:
> >>>>> illeg^30.820824 technolog^22.290413 transfer^33.307804
> >>>>> Error: java.lang.ArrayIndexOutOfBoundsException:
> >>>>> 132704java.lang.ArrayIndexOutOfBoundsException: 132704
> >>>>> at
> >>>>> org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor
> >>>>> (BooleanScorer2.java:55)
> >>>>> at org.apache.lucene.search.BooleanScorer2.score
> >>>>> (BooleanScorer2.java:358)
> >>>>> at org.apache.lucene.search.BooleanScorer2.score
> >>>>> (BooleanScorer2.java:320)
> >>>>> at org.apache.lucene.search.IndexSearcher.search
> >>>>> (IndexSearcher.java:146)
> >>>>> at org.apache.lucene.search.IndexSearcher.search
> >>>>> (IndexSearcher.java:113)
> >>>>> at org.apache.lucene.search.Searcher.search(Searcher.java:132)
> >>>>> at
> >>>>> org.cr.search.TrecQueryRelevanceFeedback.main
> >>>>> (TrecQueryRelevanceFeedback.java:776)
> >>>>>
> >>>>>
> >>>>> QueryString:
> >>>>> oceanograph^68.48028 vessel^43.191563
> >>>>> Error:
> >>>>> java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBoun
> >>>>> dsException
> >>>>> at java.lang.System.arraycopy(Native Method)
> >>>>> at
> >>>>> org.apache.lucene.index.TermVectorsReader.readTermVector
> >>>>> (TermVectorsReader.java:353)
> >>>>> at
> >>>>> org.apache.lucene.index.TermVectorsReader.readTermVectors
> >>>>> (TermVectorsReader.java:287)
> >>>>> at org.apache.lucene.index.TermVectorsReader.get
> >>>>> (TermVectorsReader.java:232)
> >>>>> at
> >>>>> org.apache.lucene.index.SegmentReader.getTermFreqVectors
> >>>>> (SegmentReader.java:981)
> >>>>> at org.cr.rf.RelevanceFeedback.RelFeedbackWeight
> >>>>> (RelevanceFeedback.java:134)
> >>>>> at
> >>>>> org.cr.search.TrecQueryRelevanceFeedback.main
> >>>>> (TrecQueryRelevanceFeedback.java:781)
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Mark Miller wrote:
> >>>>>> Any recent changes that would expose index corruption?
> >>>>>>
> >>>>>> I am getting two new errors when trying to search:
> >>>>>>
> >>>>>> nullpointer fieldsreaders line 260
> >>>>>>
> >>>>>> indexoutofbounds on fieldinfo line 185
> >>>>>>
> >>>>>> I am kind of screwed, because reindexing fixes this, but I cant
> >>>>>> reindex!
> >>>>>>
> >>>>>> Any ideas?
> >>>>>>
> >>>>>>
> >>>>>> --------------------------------------------------------------------
> >>>>>> -
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message