lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: index corruption with latest lucene
Date Tue, 06 May 2008 09:16:28 GMT

Are you using JRE 1.6.0_04 or 1.6.0_05?

This sounds exactly the same as this:

     http://www.gossamer-threads.com/lists/lucene/java-user/59650

If it is the same issue, which seems to be a bug in the hotspot  
compiler, downgrading to JRE 1.6.0_03, or running Java with -Xbatch  
(forces up-front compilation) or -Xint (disables compilation) works  
around it.

Can you test either of these and report back?  Thanks.

Mike

Gopikrishnan Subramani wrote:
> [ Sorry if I'm hijacking this thread, if you feel this error is  
> unrelated to
> this thread, I'll move this to a separate thread. ]
>
> Even after upgrading to 2.3.1 I'm running into index corruption  
> problems.
> I'm posting below the exception that is generated while searching.  
> The stack
> trace looks like,
>
>
> org.apache.lucene.index.CorruptIndexException: doc counts differ  
> for segment
> _kk: fieldsReader shows 72670 but segmentInfo shows 72671
>         at
> org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java: 
> 313)
>         at org.apache.lucene.index.SegmentReader.get 
> (SegmentReader.java:262)
>         at org.apache.lucene.index.SegmentReader.get 
> (SegmentReader.java:230)
>         at
> org.apache.lucene.index.DirectoryIndexReader$1.doBody 
> (DirectoryIndexReader.java:73)
>         at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run 
> (SegmentInfos.java:636)
>         at
> org.apache.lucene.index.DirectoryIndexReader.open 
> (DirectoryIndexReader.java:63)
>         at org.apache.lucene.index.IndexReader.open 
> (IndexReader.java:209)
>         at org.apache.lucene.index.IndexReader.open 
> (IndexReader.java:173)
>         at
> org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:48)
>
>
> About the setup: all documents in the these indexes have the same  
> set of
> fields, but some fields are not added if the value is null. We have  
> over 500
> indexes and they are indexed incrementally on a daily basis. The  
> index is
> updated in place with autocommit turned on. A single thread writes  
> to the
> index and once all the documents are updated, the index is  
> commited. About 5
> indexes are getting corrupted per week on an average and a full  
> index fixes
> the problem. This is proving to be a lot of pain and any help in  
> identifying
> the problem is much appreicated.
>
> thanks,
> Gopi
>
> On 5/6/08, Mark Miller <markrmiller@gmail.com> wrote:
>>
>> I am getting even more confused. I luckily found a copy of one of the
>> corrupted test indices that i had made on 4/28/08...lucky as its the
>> only one I have ever made :) It doesn't have the problem. This is  
>> very
>> interesting to me, because the other site that has the problem has  
>> been
>> in action for months now. Both were running with my previous  
>> version of
>> Lucene, which was a trunk build from around when 2.3 was released I
>> think. Just seems odd that the test index was corrupted so recently.
>>
>>
>> So I am a bit stuck...its probably my own problem though, so unless
>> someone else sees it, Ill just report back if/when I find out more.
>>
>> - Mark
>>
>>
>> On Mon, 2008-05-05 at 18:07 -0400, Michael McCandless wrote:
>>> Mark,
>>>
>>> Which exact version of the JRE are you using?
>>>
>>> Mike
>>>
>>> Mark Miller wrote:
>>>> On Mon, 2008-05-05 at 17:26 -0400, Michael McCandless wrote:
>>>>> Actually that stack trace looks like it's from trunk, not from  
>>>>> 2.3.2
>>>>> (pre)?  OK, I think you said it's from "post 2.3 trunk".
>>>>
>>>> Right...the Lucene that showed the problem was build from a  
>>>> trunk grab
>>>> late last week. One of the problem indexes was built with a 2.0  
>>>> or 2.1
>>>> and the other was built with a post 2.3 trunk (but weeks (prob  
>>>> months)
>>>> before the one i grabbed late last week :) )
>>>>
>>>>>
>>>>> Another question: is autoCommit false or true?
>>>> false
>>>>
>>>>
>>>>
>>>> If I can get you an affected index I will.
>>>>
>>>> - mark
>>>>
>>>>
>>>>>
>>>>> More responses below:
>>>>>
>>>>> Mark Miller wrote:
>>>>>> On Mon, 2008-05-05 at 16:32 -0400, Michael McCandless wrote:
>>>>>>> Hi Mark,
>>>>>>>
>>>>>>> Not good!
>>>>>>>
>>>>>>> Can you describe how this index was created?  Did you use  
>>>>>>> multiple
>>>>>>> threads on one IndexWriter?  Multiple sessions of IndexWriter
>>>>>>> appending to the index?  addIndexes*?  Is the index copied  
>>>>>>> from one
>>>>>>> place to another after being written and before being searched?
>>>>>>
>>>>>> Both sites were created by a single thread on a single  
>>>>>> IndexWriter.
>>>>>> Updates are done through multiple threads and one IndexWriter. No
>>>>>> addIndexes. Index was never copied, always same path.
>>>>>>
>>>>>>>
>>>>>>> If you run CheckIndex, what does it report?
>>>>>>
>>>>>> This was my next move...unfortunately, someone accidentally  
>>>>>> kicked
>>>>>> off a
>>>>>> complete reindex before I could do it. From what I can tell by  
>>>>>> the
>>>>>> stack
>>>>>> trace, its a per doc problem...I am guessing I could have
>>>>>> printed the
>>>>>> ids of the problem docs and just reindex those? I have to deal  
>>>>>> with
>>>>>> this
>>>>>> at many other sites, so that may be my attack...I cannot reindex
>>>>>> everything to fix.
>>>>>
>>>>> It would be great to know if that workaround works (and indeed  
>>>>> it's a
>>>>> per-doc issue).  I'd also love to know how many docs are affected,
>>>>> when you hit this.
>>>>>
>>>>> If there's any way to zip up the index and send it to me, even  
>>>>> just
>>>>> the files for the one segment that has the corrupted doc,  
>>>>> that'd be
>>>>> great.
>>>>>
>>>>>>>
>>>>>>> Any prior exceptions on this index?
>>>>>>
>>>>>> Not that I can recall. One of the indexes was made months ago,  
>>>>>> prob
>>>>>> with
>>>>>> a 2.0 or 2.1 Lucene, the second was made with a post 2.2  
>>>>>> Lucene. One
>>>>>> site was windows 2003, the other AIX. One site was only 30,000
>>>>>> docs, the
>>>>>> other over 1 million.
>>>>>>
>>>>>>>
>>>>>>> Are your docs a variable schema (different fields)?
>>>>>>
>>>>>> Yes. Lots of different fields depending on the doc.
>>>>>>
>>>>>>>
>>>>>>> Mike
>>>>>>
>>>>>> Thanks Mike. I am currently trying to duplicate this. I can't  
>>>>>> go to
>>>>>> another site without testing some kind of fix.
>>>>>>
>>>>>>>
>>>>>>> Mark Miller wrote:
>>>>>>>> Yeah, its pretty close to 2.3.2, but I think from last week
 
>>>>>>>> mabye.
>>>>>>>>
>>>>>>>> I finally have one of the stack traces (this comes on the
tail
>>>>>>>> complete
>>>>>>>> laptop failure so I am scrambling here)
>>>>>>>>
>>>>>>>> java.lang.IndexOutOfBoundsException: Index: 97, Size: 43
>>>>>>>>         at java.util.ArrayList.RangeCheck(ArrayList.java:572)
>>>>>>>>         at java.util.ArrayList.get(ArrayList.java:347)
>>>>>>>>         at org.apache.lucene.index.FieldInfos.fieldInfo
>>>>>>>> (FieldInfos.java:260)
>>>>>>>>         at org.apache.lucene.index.FieldsReader.doc
>>>>>>>> (FieldsReader.java:184)
>>>>>>>>         at org.apache.lucene.index.SegmentReader.document
>>>>>>>> (SegmentReader.java:670)
>>>>>>>>         at org.apache.lucene.index.MultiSegmentReader.document
>>>>>>>> (MultiSegmentReader.java:257)
>>>>>>>>         at org.apache.lucene.search.IndexSearcher.doc
>>>>>>>> (IndexSearcher.java:97)
>>>>>>>>
>>>>>>>> On Mon, 2008-05-05 at 14:48 -0500, crspan wrote:
>>>>>>>>> coincidence or it is from 2.3.2 ?
>>>>>>>>>
>>>>>>>>> env:
>>>>>>>>> lucene 2.3.2
>>>>>>>>> jdk1.6.0_06 & jdk1.5.0_15
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> QueryString:
>>>>>>>>> illeg^30.820824 technolog^22.290413 transfer^33.307804
>>>>>>>>> Error: java.lang.ArrayIndexOutOfBoundsException:
>>>>>>>>> 132704java.lang.ArrayIndexOutOfBoundsException: 132704
>>>>>>>>> at
>>>>>>>>> org.apache.lucene.search.BooleanScorer2 
>>>>>>>>> $Coordinator.coordFactor
>>>>>>>>> (BooleanScorer2.java:55)
>>>>>>>>> at org.apache.lucene.search.BooleanScorer2.score
>>>>>>>>> (BooleanScorer2.java:358)
>>>>>>>>> at org.apache.lucene.search.BooleanScorer2.score
>>>>>>>>> (BooleanScorer2.java:320)
>>>>>>>>> at org.apache.lucene.search.IndexSearcher.search
>>>>>>>>> (IndexSearcher.java:146)
>>>>>>>>> at org.apache.lucene.search.IndexSearcher.search
>>>>>>>>> (IndexSearcher.java:113)
>>>>>>>>> at org.apache.lucene.search.Searcher.search(Searcher.java:132)
>>>>>>>>> at
>>>>>>>>> org.cr.search.TrecQueryRelevanceFeedback.main
>>>>>>>>> (TrecQueryRelevanceFeedback.java:776)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> QueryString:
>>>>>>>>> oceanograph^68.48028 vessel^43.191563
>>>>>>>>> Error:
>>>>>>>>> java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOu

>>>>>>>>> tOf
>>>>>>>>> Bo
>>>>>>>>> un
>>>>>>>>> dsException
>>>>>>>>> at java.lang.System.arraycopy(Native Method)
>>>>>>>>> at
>>>>>>>>> org.apache.lucene.index.TermVectorsReader.readTermVector
>>>>>>>>> (TermVectorsReader.java:353)
>>>>>>>>> at
>>>>>>>>> org.apache.lucene.index.TermVectorsReader.readTermVectors
>>>>>>>>> (TermVectorsReader.java:287)
>>>>>>>>> at org.apache.lucene.index.TermVectorsReader.get
>>>>>>>>> (TermVectorsReader.java:232)
>>>>>>>>> at
>>>>>>>>> org.apache.lucene.index.SegmentReader.getTermFreqVectors
>>>>>>>>> (SegmentReader.java:981)
>>>>>>>>> at org.cr.rf.RelevanceFeedback.RelFeedbackWeight
>>>>>>>>> (RelevanceFeedback.java:134)
>>>>>>>>> at
>>>>>>>>> org.cr.search.TrecQueryRelevanceFeedback.main
>>>>>>>>> (TrecQueryRelevanceFeedback.java:781)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Mark Miller wrote:
>>>>>>>>>> Any recent changes that would expose index corruption?
>>>>>>>>>>
>>>>>>>>>> I am getting two new errors when trying to search:
>>>>>>>>>>
>>>>>>>>>> nullpointer fieldsreaders line 260
>>>>>>>>>>
>>>>>>>>>> indexoutofbounds on fieldinfo line 185
>>>>>>>>>>
>>>>>>>>>> I am kind of screwed, because reindexing fixes this,
but I  
>>>>>>>>>> cant
>>>>>>>>>> reindex!
>>>>>>>>>>
>>>>>>>>>> Any ideas?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -------------------------------------------------------------

>>>>>>>>>> ---
>>>>>>>>>> --
>>>>>>>>>> --
>>>>>>>>>> -
>>>>>>>>>> To unsubscribe, e-mail: java-user- 
>>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>>> For additional commands, e-mail: java-user-
>>>>>>>>>> help@lucene.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --------------------------------------------------------------

>>>>>>>>> ---
>>>>>>>>> --
>>>>>>>>> --
>>>>>>>>> To unsubscribe, e-mail: java-user- 
>>>>>>>>> unsubscribe@lucene.apache.org
>>>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>>>> help@lucene.apache.org
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------

>>>>>>>> ---
>>>>>>>> --
>>>>>>>> -
>>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>>> help@lucene.apache.org
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------

>>>>>>> ---
>>>>>>> --
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user- 
>>>>>>> help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>>
>>>>>> -----------------------------------------------------------------

>>>>>> ---
>>>>>> -
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------ 
>>>>> ---
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------- 
>>>> --
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>>
>>> -------------------------------------------------------------------- 
>>> -
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message