lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: index corruption with latest lucene
Date Mon, 05 May 2008 22:24:31 GMT
MB: Ah, thanks for clearing the version stuff up...I just assumed that
trunk last week was pretty close to 2.3.1. I am def trunk last thurs or
fri. Perhaps the problem is after 2.3.1, and perhaps the problem is only
with me.

MM: FYI- I upgraded a really old test install (hasnt been touched by a
new version of lucene since mid last year) and it had no issues in it.
The install was almost identical to the install I had problems with, the
difference being that the problem install was created and touched by
later versions of Lucene.

I often (foolishly/bravely) work off the trunk for features I want, so
its entirely possible I have created my own pain, and point release
users may not be affected.

The win 2003 java version is 1.6.0_5

The AIX version I cannot check right now, but is 1.5 something (prob one
version before the one they released a week or 3 ago.

- Mark


On Mon, 2008-05-05 at 18:07 -0400, Michael McCandless wrote:
> Mark,
> 
> Which exact version of the JRE are you using?
> 
> Mike
> 
> Mark Miller wrote:
> > On Mon, 2008-05-05 at 17:26 -0400, Michael McCandless wrote:
> >> Actually that stack trace looks like it's from trunk, not from 2.3.2
> >> (pre)?  OK, I think you said it's from "post 2.3 trunk".
> >
> > Right...the Lucene that showed the problem was build from a trunk grab
> > late last week. One of the problem indexes was built with a 2.0 or 2.1
> > and the other was built with a post 2.3 trunk (but weeks (prob months)
> > before the one i grabbed late last week :) )
> >
> >>
> >> Another question: is autoCommit false or true?
> > false
> >
> >
> >
> > If I can get you an affected index I will.
> >
> > - mark
> >
> >
> >>
> >> More responses below:
> >>
> >> Mark Miller wrote:
> >>> On Mon, 2008-05-05 at 16:32 -0400, Michael McCandless wrote:
> >>>> Hi Mark,
> >>>>
> >>>> Not good!
> >>>>
> >>>> Can you describe how this index was created?  Did you use multiple
> >>>> threads on one IndexWriter?  Multiple sessions of IndexWriter
> >>>> appending to the index?  addIndexes*?  Is the index copied from one
> >>>> place to another after being written and before being searched?
> >>>
> >>> Both sites were created by a single thread on a single IndexWriter.
> >>> Updates are done through multiple threads and one IndexWriter. No
> >>> addIndexes. Index was never copied, always same path.
> >>>
> >>>>
> >>>> If you run CheckIndex, what does it report?
> >>>
> >>> This was my next move...unfortunately, someone accidentally kicked
> >>> off a
> >>> complete reindex before I could do it. From what I can tell by the
> >>> stack
> >>> trace, its a per doc problem...I am guessing I could have   
> >>> printed the
> >>> ids of the problem docs and just reindex those? I have to deal with
> >>> this
> >>> at many other sites, so that may be my attack...I cannot reindex
> >>> everything to fix.
> >>
> >> It would be great to know if that workaround works (and indeed it's a
> >> per-doc issue).  I'd also love to know how many docs are affected,
> >> when you hit this.
> >>
> >> If there's any way to zip up the index and send it to me, even just
> >> the files for the one segment that has the corrupted doc, that'd be
> >> great.
> >>
> >>>>
> >>>> Any prior exceptions on this index?
> >>>
> >>> Not that I can recall. One of the indexes was made months ago, prob
> >>> with
> >>> a 2.0 or 2.1 Lucene, the second was made with a post 2.2 Lucene. One
> >>> site was windows 2003, the other AIX. One site was only 30,000
> >>> docs, the
> >>> other over 1 million.
> >>>
> >>>>
> >>>> Are your docs a variable schema (different fields)?
> >>>
> >>> Yes. Lots of different fields depending on the doc.
> >>>
> >>>>
> >>>> Mike
> >>>
> >>> Thanks Mike. I am currently trying to duplicate this. I can't go to
> >>> another site without testing some kind of fix.
> >>>
> >>>>
> >>>> Mark Miller wrote:
> >>>>> Yeah, its pretty close to 2.3.2, but I think from last week mabye.
> >>>>>
> >>>>> I finally have one of the stack traces (this comes on the tail
> >>>>> complete
> >>>>> laptop failure so I am scrambling here)
> >>>>>
> >>>>> java.lang.IndexOutOfBoundsException: Index: 97, Size: 43
> >>>>>         at java.util.ArrayList.RangeCheck(ArrayList.java:572)
> >>>>>         at java.util.ArrayList.get(ArrayList.java:347)
> >>>>>         at org.apache.lucene.index.FieldInfos.fieldInfo
> >>>>> (FieldInfos.java:260)
> >>>>>         at org.apache.lucene.index.FieldsReader.doc
> >>>>> (FieldsReader.java:184)
> >>>>>         at org.apache.lucene.index.SegmentReader.document
> >>>>> (SegmentReader.java:670)
> >>>>>         at org.apache.lucene.index.MultiSegmentReader.document
> >>>>> (MultiSegmentReader.java:257)
> >>>>>         at org.apache.lucene.search.IndexSearcher.doc
> >>>>> (IndexSearcher.java:97)
> >>>>>
> >>>>> On Mon, 2008-05-05 at 14:48 -0500, crspan wrote:
> >>>>>> coincidence or it is from 2.3.2 ?
> >>>>>>
> >>>>>> env:
> >>>>>> lucene 2.3.2
> >>>>>> jdk1.6.0_06 & jdk1.5.0_15
> >>>>>>
> >>>>>>
> >>>>>> QueryString:
> >>>>>> illeg^30.820824 technolog^22.290413 transfer^33.307804
> >>>>>> Error: java.lang.ArrayIndexOutOfBoundsException:
> >>>>>> 132704java.lang.ArrayIndexOutOfBoundsException: 132704
> >>>>>> at
> >>>>>> org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor
> >>>>>> (BooleanScorer2.java:55)
> >>>>>> at org.apache.lucene.search.BooleanScorer2.score
> >>>>>> (BooleanScorer2.java:358)
> >>>>>> at org.apache.lucene.search.BooleanScorer2.score
> >>>>>> (BooleanScorer2.java:320)
> >>>>>> at org.apache.lucene.search.IndexSearcher.search
> >>>>>> (IndexSearcher.java:146)
> >>>>>> at org.apache.lucene.search.IndexSearcher.search
> >>>>>> (IndexSearcher.java:113)
> >>>>>> at org.apache.lucene.search.Searcher.search(Searcher.java:132)
> >>>>>> at
> >>>>>> org.cr.search.TrecQueryRelevanceFeedback.main
> >>>>>> (TrecQueryRelevanceFeedback.java:776)
> >>>>>>
> >>>>>>
> >>>>>> QueryString:
> >>>>>> oceanograph^68.48028 vessel^43.191563
> >>>>>> Error:
> >>>>>> java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOf

> >>>>>> Bo
> >>>>>> un
> >>>>>> dsException
> >>>>>> at java.lang.System.arraycopy(Native Method)
> >>>>>> at
> >>>>>> org.apache.lucene.index.TermVectorsReader.readTermVector
> >>>>>> (TermVectorsReader.java:353)
> >>>>>> at
> >>>>>> org.apache.lucene.index.TermVectorsReader.readTermVectors
> >>>>>> (TermVectorsReader.java:287)
> >>>>>> at org.apache.lucene.index.TermVectorsReader.get
> >>>>>> (TermVectorsReader.java:232)
> >>>>>> at
> >>>>>> org.apache.lucene.index.SegmentReader.getTermFreqVectors
> >>>>>> (SegmentReader.java:981)
> >>>>>> at org.cr.rf.RelevanceFeedback.RelFeedbackWeight
> >>>>>> (RelevanceFeedback.java:134)
> >>>>>> at
> >>>>>> org.cr.search.TrecQueryRelevanceFeedback.main
> >>>>>> (TrecQueryRelevanceFeedback.java:781)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Mark Miller wrote:
> >>>>>>> Any recent changes that would expose index corruption?
> >>>>>>>
> >>>>>>> I am getting two new errors when trying to search:
> >>>>>>>
> >>>>>>> nullpointer fieldsreaders line 260
> >>>>>>>
> >>>>>>> indexoutofbounds on fieldinfo line 185
> >>>>>>>
> >>>>>>> I am kind of screwed, because reindexing fixes this, but
I cant
> >>>>>>> reindex!
> >>>>>>>
> >>>>>>> Any ideas?
> >>>>>>>
> >>>>>>>
> >>>>>>> ----------------------------------------------------------------

> >>>>>>> --
> >>>>>>> --
> >>>>>>> -
> >>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>>> For additional commands, e-mail: java-user- 
> >>>>>>> help@lucene.apache.org
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> -----------------------------------------------------------------

> >>>>>> --
> >>>>>> --
> >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>>
> >>>>>
> >>>>>
> >>>>> ------------------------------------------------------------------

> >>>>> --
> >>>>> -
> >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>>
> >>>>
> >>>>
> >>>> -------------------------------------------------------------------

> >>>> --
> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>>
> >>>
> >>>
> >>> -------------------------------------------------------------------- 
> >>> -
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message