lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <busch...@gmail.com>
Subject Re: index corruption with latest lucene
Date Mon, 05 May 2008 22:29:11 GMT
Mark Miller wrote:
> MB: Ah, thanks for clearing the version stuff up...I just assumed that
> trunk last week was pretty close to 2.3.1. I am def trunk last thurs or
> fri. Perhaps the problem is after 2.3.1, and perhaps the problem is only
> with me.
> 

OK, thanks for verifying. I'll go ahead and publish 2.3.2 then. If it 
still turns out that this problem also occurs on the 2_3 branch then we 
can always make a 2.3.3 release.

-Michael

> MM: FYI- I upgraded a really old test install (hasnt been touched by a
> new version of lucene since mid last year) and it had no issues in it.
> The install was almost identical to the install I had problems with, the
> difference being that the problem install was created and touched by
> later versions of Lucene.
> 
> I often (foolishly/bravely) work off the trunk for features I want, so
> its entirely possible I have created my own pain, and point release
> users may not be affected.
> 
> The win 2003 java version is 1.6.0_5
> 
> The AIX version I cannot check right now, but is 1.5 something (prob one
> version before the one they released a week or 3 ago.
> 
> - Mark
> 
> 
> On Mon, 2008-05-05 at 18:07 -0400, Michael McCandless wrote:
>> Mark,
>>
>> Which exact version of the JRE are you using?
>>
>> Mike
>>
>> Mark Miller wrote:
>>> On Mon, 2008-05-05 at 17:26 -0400, Michael McCandless wrote:
>>>> Actually that stack trace looks like it's from trunk, not from 2.3.2
>>>> (pre)?  OK, I think you said it's from "post 2.3 trunk".
>>> Right...the Lucene that showed the problem was build from a trunk grab
>>> late last week. One of the problem indexes was built with a 2.0 or 2.1
>>> and the other was built with a post 2.3 trunk (but weeks (prob months)
>>> before the one i grabbed late last week :) )
>>>
>>>> Another question: is autoCommit false or true?
>>> false
>>>
>>>
>>>
>>> If I can get you an affected index I will.
>>>
>>> - mark
>>>
>>>
>>>> More responses below:
>>>>
>>>> Mark Miller wrote:
>>>>> On Mon, 2008-05-05 at 16:32 -0400, Michael McCandless wrote:
>>>>>> Hi Mark,
>>>>>>
>>>>>> Not good!
>>>>>>
>>>>>> Can you describe how this index was created?  Did you use multiple
>>>>>> threads on one IndexWriter?  Multiple sessions of IndexWriter
>>>>>> appending to the index?  addIndexes*?  Is the index copied from one
>>>>>> place to another after being written and before being searched?
>>>>> Both sites were created by a single thread on a single IndexWriter.
>>>>> Updates are done through multiple threads and one IndexWriter. No
>>>>> addIndexes. Index was never copied, always same path.
>>>>>
>>>>>> If you run CheckIndex, what does it report?
>>>>> This was my next move...unfortunately, someone accidentally kicked
>>>>> off a
>>>>> complete reindex before I could do it. From what I can tell by the
>>>>> stack
>>>>> trace, its a per doc problem...I am guessing I could have   
>>>>> printed the
>>>>> ids of the problem docs and just reindex those? I have to deal with
>>>>> this
>>>>> at many other sites, so that may be my attack...I cannot reindex
>>>>> everything to fix.
>>>> It would be great to know if that workaround works (and indeed it's a
>>>> per-doc issue).  I'd also love to know how many docs are affected,
>>>> when you hit this.
>>>>
>>>> If there's any way to zip up the index and send it to me, even just
>>>> the files for the one segment that has the corrupted doc, that'd be
>>>> great.
>>>>
>>>>>> Any prior exceptions on this index?
>>>>> Not that I can recall. One of the indexes was made months ago, prob
>>>>> with
>>>>> a 2.0 or 2.1 Lucene, the second was made with a post 2.2 Lucene. One
>>>>> site was windows 2003, the other AIX. One site was only 30,000
>>>>> docs, the
>>>>> other over 1 million.
>>>>>
>>>>>> Are your docs a variable schema (different fields)?
>>>>> Yes. Lots of different fields depending on the doc.
>>>>>
>>>>>> Mike
>>>>> Thanks Mike. I am currently trying to duplicate this. I can't go to
>>>>> another site without testing some kind of fix.
>>>>>
>>>>>> Mark Miller wrote:
>>>>>>> Yeah, its pretty close to 2.3.2, but I think from last week mabye.
>>>>>>>
>>>>>>> I finally have one of the stack traces (this comes on the tail
>>>>>>> complete
>>>>>>> laptop failure so I am scrambling here)
>>>>>>>
>>>>>>> java.lang.IndexOutOfBoundsException: Index: 97, Size: 43
>>>>>>>         at java.util.ArrayList.RangeCheck(ArrayList.java:572)
>>>>>>>         at java.util.ArrayList.get(ArrayList.java:347)
>>>>>>>         at org.apache.lucene.index.FieldInfos.fieldInfo
>>>>>>> (FieldInfos.java:260)
>>>>>>>         at org.apache.lucene.index.FieldsReader.doc
>>>>>>> (FieldsReader.java:184)
>>>>>>>         at org.apache.lucene.index.SegmentReader.document
>>>>>>> (SegmentReader.java:670)
>>>>>>>         at org.apache.lucene.index.MultiSegmentReader.document
>>>>>>> (MultiSegmentReader.java:257)
>>>>>>>         at org.apache.lucene.search.IndexSearcher.doc
>>>>>>> (IndexSearcher.java:97)
>>>>>>>
>>>>>>> On Mon, 2008-05-05 at 14:48 -0500, crspan wrote:
>>>>>>>> coincidence or it is from 2.3.2 ?
>>>>>>>>
>>>>>>>> env:
>>>>>>>> lucene 2.3.2
>>>>>>>> jdk1.6.0_06 & jdk1.5.0_15
>>>>>>>>
>>>>>>>>
>>>>>>>> QueryString:
>>>>>>>> illeg^30.820824 technolog^22.290413 transfer^33.307804
>>>>>>>> Error: java.lang.ArrayIndexOutOfBoundsException:
>>>>>>>> 132704java.lang.ArrayIndexOutOfBoundsException: 132704
>>>>>>>> at
>>>>>>>> org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor
>>>>>>>> (BooleanScorer2.java:55)
>>>>>>>> at org.apache.lucene.search.BooleanScorer2.score
>>>>>>>> (BooleanScorer2.java:358)
>>>>>>>> at org.apache.lucene.search.BooleanScorer2.score
>>>>>>>> (BooleanScorer2.java:320)
>>>>>>>> at org.apache.lucene.search.IndexSearcher.search
>>>>>>>> (IndexSearcher.java:146)
>>>>>>>> at org.apache.lucene.search.IndexSearcher.search
>>>>>>>> (IndexSearcher.java:113)
>>>>>>>> at org.apache.lucene.search.Searcher.search(Searcher.java:132)
>>>>>>>> at
>>>>>>>> org.cr.search.TrecQueryRelevanceFeedback.main
>>>>>>>> (TrecQueryRelevanceFeedback.java:776)
>>>>>>>>
>>>>>>>>
>>>>>>>> QueryString:
>>>>>>>> oceanograph^68.48028 vessel^43.191563
>>>>>>>> Error:
>>>>>>>> java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOf

>>>>>>>> Bo
>>>>>>>> un
>>>>>>>> dsException
>>>>>>>> at java.lang.System.arraycopy(Native Method)
>>>>>>>> at
>>>>>>>> org.apache.lucene.index.TermVectorsReader.readTermVector
>>>>>>>> (TermVectorsReader.java:353)
>>>>>>>> at
>>>>>>>> org.apache.lucene.index.TermVectorsReader.readTermVectors
>>>>>>>> (TermVectorsReader.java:287)
>>>>>>>> at org.apache.lucene.index.TermVectorsReader.get
>>>>>>>> (TermVectorsReader.java:232)
>>>>>>>> at
>>>>>>>> org.apache.lucene.index.SegmentReader.getTermFreqVectors
>>>>>>>> (SegmentReader.java:981)
>>>>>>>> at org.cr.rf.RelevanceFeedback.RelFeedbackWeight
>>>>>>>> (RelevanceFeedback.java:134)
>>>>>>>> at
>>>>>>>> org.cr.search.TrecQueryRelevanceFeedback.main
>>>>>>>> (TrecQueryRelevanceFeedback.java:781)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Mark Miller wrote:
>>>>>>>>> Any recent changes that would expose index corruption?
>>>>>>>>>
>>>>>>>>> I am getting two new errors when trying to search:
>>>>>>>>>
>>>>>>>>> nullpointer fieldsreaders line 260
>>>>>>>>>
>>>>>>>>> indexoutofbounds on fieldinfo line 185
>>>>>>>>>
>>>>>>>>> I am kind of screwed, because reindexing fixes this,
but I cant
>>>>>>>>> reindex!
>>>>>>>>>
>>>>>>>>> Any ideas?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ----------------------------------------------------------------

>>>>>>>>> --
>>>>>>>>> --
>>>>>>>>> -
>>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>>>> help@lucene.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> -----------------------------------------------------------------

>>>>>>>> --
>>>>>>>> --
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------

>>>>>>> --
>>>>>>> -
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>> -------------------------------------------------------------------

>>>>>> --
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>
>>>>> --------------------------------------------------------------------

>>>>> -
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message