lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: index corruption with latest lucene
Date Mon, 05 May 2008 22:07:48 GMT

Mark,

Which exact version of the JRE are you using?

Mike

Mark Miller wrote:
> On Mon, 2008-05-05 at 17:26 -0400, Michael McCandless wrote:
>> Actually that stack trace looks like it's from trunk, not from 2.3.2
>> (pre)?  OK, I think you said it's from "post 2.3 trunk".
>
> Right...the Lucene that showed the problem was build from a trunk grab
> late last week. One of the problem indexes was built with a 2.0 or 2.1
> and the other was built with a post 2.3 trunk (but weeks (prob months)
> before the one i grabbed late last week :) )
>
>>
>> Another question: is autoCommit false or true?
> false
>
>
>
> If I can get you an affected index I will.
>
> - mark
>
>
>>
>> More responses below:
>>
>> Mark Miller wrote:
>>> On Mon, 2008-05-05 at 16:32 -0400, Michael McCandless wrote:
>>>> Hi Mark,
>>>>
>>>> Not good!
>>>>
>>>> Can you describe how this index was created?  Did you use multiple
>>>> threads on one IndexWriter?  Multiple sessions of IndexWriter
>>>> appending to the index?  addIndexes*?  Is the index copied from one
>>>> place to another after being written and before being searched?
>>>
>>> Both sites were created by a single thread on a single IndexWriter.
>>> Updates are done through multiple threads and one IndexWriter. No
>>> addIndexes. Index was never copied, always same path.
>>>
>>>>
>>>> If you run CheckIndex, what does it report?
>>>
>>> This was my next move...unfortunately, someone accidentally kicked
>>> off a
>>> complete reindex before I could do it. From what I can tell by the
>>> stack
>>> trace, its a per doc problem...I am guessing I could have   
>>> printed the
>>> ids of the problem docs and just reindex those? I have to deal with
>>> this
>>> at many other sites, so that may be my attack...I cannot reindex
>>> everything to fix.
>>
>> It would be great to know if that workaround works (and indeed it's a
>> per-doc issue).  I'd also love to know how many docs are affected,
>> when you hit this.
>>
>> If there's any way to zip up the index and send it to me, even just
>> the files for the one segment that has the corrupted doc, that'd be
>> great.
>>
>>>>
>>>> Any prior exceptions on this index?
>>>
>>> Not that I can recall. One of the indexes was made months ago, prob
>>> with
>>> a 2.0 or 2.1 Lucene, the second was made with a post 2.2 Lucene. One
>>> site was windows 2003, the other AIX. One site was only 30,000
>>> docs, the
>>> other over 1 million.
>>>
>>>>
>>>> Are your docs a variable schema (different fields)?
>>>
>>> Yes. Lots of different fields depending on the doc.
>>>
>>>>
>>>> Mike
>>>
>>> Thanks Mike. I am currently trying to duplicate this. I can't go to
>>> another site without testing some kind of fix.
>>>
>>>>
>>>> Mark Miller wrote:
>>>>> Yeah, its pretty close to 2.3.2, but I think from last week mabye.
>>>>>
>>>>> I finally have one of the stack traces (this comes on the tail
>>>>> complete
>>>>> laptop failure so I am scrambling here)
>>>>>
>>>>> java.lang.IndexOutOfBoundsException: Index: 97, Size: 43
>>>>>         at java.util.ArrayList.RangeCheck(ArrayList.java:572)
>>>>>         at java.util.ArrayList.get(ArrayList.java:347)
>>>>>         at org.apache.lucene.index.FieldInfos.fieldInfo
>>>>> (FieldInfos.java:260)
>>>>>         at org.apache.lucene.index.FieldsReader.doc
>>>>> (FieldsReader.java:184)
>>>>>         at org.apache.lucene.index.SegmentReader.document
>>>>> (SegmentReader.java:670)
>>>>>         at org.apache.lucene.index.MultiSegmentReader.document
>>>>> (MultiSegmentReader.java:257)
>>>>>         at org.apache.lucene.search.IndexSearcher.doc
>>>>> (IndexSearcher.java:97)
>>>>>
>>>>> On Mon, 2008-05-05 at 14:48 -0500, crspan wrote:
>>>>>> coincidence or it is from 2.3.2 ?
>>>>>>
>>>>>> env:
>>>>>> lucene 2.3.2
>>>>>> jdk1.6.0_06 & jdk1.5.0_15
>>>>>>
>>>>>>
>>>>>> QueryString:
>>>>>> illeg^30.820824 technolog^22.290413 transfer^33.307804
>>>>>> Error: java.lang.ArrayIndexOutOfBoundsException:
>>>>>> 132704java.lang.ArrayIndexOutOfBoundsException: 132704
>>>>>> at
>>>>>> org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor
>>>>>> (BooleanScorer2.java:55)
>>>>>> at org.apache.lucene.search.BooleanScorer2.score
>>>>>> (BooleanScorer2.java:358)
>>>>>> at org.apache.lucene.search.BooleanScorer2.score
>>>>>> (BooleanScorer2.java:320)
>>>>>> at org.apache.lucene.search.IndexSearcher.search
>>>>>> (IndexSearcher.java:146)
>>>>>> at org.apache.lucene.search.IndexSearcher.search
>>>>>> (IndexSearcher.java:113)
>>>>>> at org.apache.lucene.search.Searcher.search(Searcher.java:132)
>>>>>> at
>>>>>> org.cr.search.TrecQueryRelevanceFeedback.main
>>>>>> (TrecQueryRelevanceFeedback.java:776)
>>>>>>
>>>>>>
>>>>>> QueryString:
>>>>>> oceanograph^68.48028 vessel^43.191563
>>>>>> Error:
>>>>>> java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOf

>>>>>> Bo
>>>>>> un
>>>>>> dsException
>>>>>> at java.lang.System.arraycopy(Native Method)
>>>>>> at
>>>>>> org.apache.lucene.index.TermVectorsReader.readTermVector
>>>>>> (TermVectorsReader.java:353)
>>>>>> at
>>>>>> org.apache.lucene.index.TermVectorsReader.readTermVectors
>>>>>> (TermVectorsReader.java:287)
>>>>>> at org.apache.lucene.index.TermVectorsReader.get
>>>>>> (TermVectorsReader.java:232)
>>>>>> at
>>>>>> org.apache.lucene.index.SegmentReader.getTermFreqVectors
>>>>>> (SegmentReader.java:981)
>>>>>> at org.cr.rf.RelevanceFeedback.RelFeedbackWeight
>>>>>> (RelevanceFeedback.java:134)
>>>>>> at
>>>>>> org.cr.search.TrecQueryRelevanceFeedback.main
>>>>>> (TrecQueryRelevanceFeedback.java:781)
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Mark Miller wrote:
>>>>>>> Any recent changes that would expose index corruption?
>>>>>>>
>>>>>>> I am getting two new errors when trying to search:
>>>>>>>
>>>>>>> nullpointer fieldsreaders line 260
>>>>>>>
>>>>>>> indexoutofbounds on fieldinfo line 185
>>>>>>>
>>>>>>> I am kind of screwed, because reindexing fixes this, but I cant
>>>>>>> reindex!
>>>>>>>
>>>>>>> Any ideas?
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------

>>>>>>> --
>>>>>>> --
>>>>>>> -
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>> help@lucene.apache.org
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> -----------------------------------------------------------------

>>>>>> --
>>>>>> --
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> --
>>>>> -
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message