lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Busch <busch...@gmail.com>
Subject Re: index corruption with latest lucene
Date Mon, 05 May 2008 22:07:33 GMT
Yeah, it's probably confusing, because we currently commit patches to 
two branches: the trunk (/repos/asf/lucene/java/trunk) and the 2.3 
branch (/repos/asf/lucene/java/branches/lucene_2_3).

So if you checked out from the trunk, then this is not the 2.3.2 
version. The 2.3.2 release candidate is from the 2.3 branch, revision 
652650.

-Michael

Mark Miller wrote:
> Man, I have even confused myself on these versions at this point. Let me
> start over.
> 
> I am having the problem with a version of lucene that was the trunk late
> last week. Which pretty much means 2.3.2.
> 
> I'd hate to hold up the release if the problem was only me though. I am
> trying to work through it as fast I can. I just have to find another
> index somewhere with the problem. Its just difficult because the indexes
> are very large and on remote live sites. I am hoping I can find another
> old test one with the problem or make one. The two installs that I have
> detected the problem were rebuilt, one inadvertently.
> 
> - Mark
> 
> On Mon, 2008-05-05 at 14:32 -0700, Michael Busch wrote:
>> If that is the case then I will go ahead and publish the 2.3.2 release? 
>> Have you seen this on 2.3.x, Mark?
>>
>> -Michael
>>
>> Michael McCandless wrote:
>>> Actually that stack trace looks like it's from trunk, not from 
>>> 2.3.2(pre)?  OK, I think you said it's from "post 2.3 trunk".
>>>
>>> Another question: is autoCommit false or true?
>>>
>>> More responses below:
>>>
>>> Mark Miller wrote:
>>>> On Mon, 2008-05-05 at 16:32 -0400, Michael McCandless wrote:
>>>>> Hi Mark,
>>>>>
>>>>> Not good!
>>>>>
>>>>> Can you describe how this index was created?  Did you use multiple
>>>>> threads on one IndexWriter?  Multiple sessions of IndexWriter
>>>>> appending to the index?  addIndexes*?  Is the index copied from one
>>>>> place to another after being written and before being searched?
>>>> Both sites were created by a single thread on a single IndexWriter.
>>>> Updates are done through multiple threads and one IndexWriter. No
>>>> addIndexes. Index was never copied, always same path.
>>>>
>>>>> If you run CheckIndex, what does it report?
>>>> This was my next move...unfortunately, someone accidentally kicked off a
>>>> complete reindex before I could do it. From what I can tell by the stack
>>>> trace, its a per doc problem...I am guessing I could have  printed the
>>>> ids of the problem docs and just reindex those? I have to deal with this
>>>> at many other sites, so that may be my attack...I cannot reindex
>>>> everything to fix.
>>> It would be great to know if that workaround works (and indeed it's a 
>>> per-doc issue).  I'd also love to know how many docs are affected, when 
>>> you hit this.
>>>
>>> If there's any way to zip up the index and send it to me, even just the 
>>> files for the one segment that has the corrupted doc, that'd be great.
>>>
>>>>> Any prior exceptions on this index?
>>>> Not that I can recall. One of the indexes was made months ago, prob with
>>>> a 2.0 or 2.1 Lucene, the second was made with a post 2.2 Lucene. One
>>>> site was windows 2003, the other AIX. One site was only 30,000 docs, the
>>>> other over 1 million.
>>>>
>>>>> Are your docs a variable schema (different fields)?
>>>> Yes. Lots of different fields depending on the doc.
>>>>
>>>>> Mike
>>>> Thanks Mike. I am currently trying to duplicate this. I can't go to
>>>> another site without testing some kind of fix.
>>>>
>>>>> Mark Miller wrote:
>>>>>> Yeah, its pretty close to 2.3.2, but I think from last week mabye.
>>>>>>
>>>>>> I finally have one of the stack traces (this comes on the tail
>>>>>> complete
>>>>>> laptop failure so I am scrambling here)
>>>>>>
>>>>>> java.lang.IndexOutOfBoundsException: Index: 97, Size: 43
>>>>>>         at java.util.ArrayList.RangeCheck(ArrayList.java:572)
>>>>>>         at java.util.ArrayList.get(ArrayList.java:347)
>>>>>>         at org.apache.lucene.index.FieldInfos.fieldInfo
>>>>>> (FieldInfos.java:260)
>>>>>>         at org.apache.lucene.index.FieldsReader.doc
>>>>>> (FieldsReader.java:184)
>>>>>>         at org.apache.lucene.index.SegmentReader.document
>>>>>> (SegmentReader.java:670)
>>>>>>         at org.apache.lucene.index.MultiSegmentReader.document
>>>>>> (MultiSegmentReader.java:257)
>>>>>>         at org.apache.lucene.search.IndexSearcher.doc
>>>>>> (IndexSearcher.java:97)
>>>>>>
>>>>>> On Mon, 2008-05-05 at 14:48 -0500, crspan wrote:
>>>>>>> coincidence or it is from 2.3.2 ?
>>>>>>>
>>>>>>> env:
>>>>>>> lucene 2.3.2
>>>>>>> jdk1.6.0_06 & jdk1.5.0_15
>>>>>>>
>>>>>>>
>>>>>>> QueryString:
>>>>>>> illeg^30.820824 technolog^22.290413 transfer^33.307804
>>>>>>> Error: java.lang.ArrayIndexOutOfBoundsException:
>>>>>>> 132704java.lang.ArrayIndexOutOfBoundsException: 132704
>>>>>>> at
>>>>>>> org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor
>>>>>>> (BooleanScorer2.java:55)
>>>>>>> at org.apache.lucene.search.BooleanScorer2.score
>>>>>>> (BooleanScorer2.java:358)
>>>>>>> at org.apache.lucene.search.BooleanScorer2.score
>>>>>>> (BooleanScorer2.java:320)
>>>>>>> at org.apache.lucene.search.IndexSearcher.search
>>>>>>> (IndexSearcher.java:146)
>>>>>>> at org.apache.lucene.search.IndexSearcher.search
>>>>>>> (IndexSearcher.java:113)
>>>>>>> at org.apache.lucene.search.Searcher.search(Searcher.java:132)
>>>>>>> at
>>>>>>> org.cr.search.TrecQueryRelevanceFeedback.main
>>>>>>> (TrecQueryRelevanceFeedback.java:776)
>>>>>>>
>>>>>>>
>>>>>>> QueryString:
>>>>>>> oceanograph^68.48028 vessel^43.191563
>>>>>>> Error:
>>>>>>> java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBoun
>>>>>>> dsException
>>>>>>> at java.lang.System.arraycopy(Native Method)
>>>>>>> at
>>>>>>> org.apache.lucene.index.TermVectorsReader.readTermVector
>>>>>>> (TermVectorsReader.java:353)
>>>>>>> at
>>>>>>> org.apache.lucene.index.TermVectorsReader.readTermVectors
>>>>>>> (TermVectorsReader.java:287)
>>>>>>> at org.apache.lucene.index.TermVectorsReader.get
>>>>>>> (TermVectorsReader.java:232)
>>>>>>> at
>>>>>>> org.apache.lucene.index.SegmentReader.getTermFreqVectors
>>>>>>> (SegmentReader.java:981)
>>>>>>> at org.cr.rf.RelevanceFeedback.RelFeedbackWeight
>>>>>>> (RelevanceFeedback.java:134)
>>>>>>> at
>>>>>>> org.cr.search.TrecQueryRelevanceFeedback.main
>>>>>>> (TrecQueryRelevanceFeedback.java:781)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Mark Miller wrote:
>>>>>>>> Any recent changes that would expose index corruption?
>>>>>>>>
>>>>>>>> I am getting two new errors when trying to search:
>>>>>>>>
>>>>>>>> nullpointer fieldsreaders line 260
>>>>>>>>
>>>>>>>> indexoutofbounds on fieldinfo line 185
>>>>>>>>
>>>>>>>> I am kind of screwed, because reindexing fixes this, but
I cant
>>>>>>>> reindex!
>>>>>>>>
>>>>>>>> Any ideas?
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------------------------------------------
>>>>>>>> -
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message