Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
Message-Id: <51A760D4-5EF9-482F-BD0A-023FE204D158@mikemccandless.com>
From: Michael McCandless <lucene@mikemccandless.com>
To: java-user@lucene.apache.org
In-Reply-To: <a206fb7e0902251729p4d4683dn3f65e7abcca2e047@mail.gmail.com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v930.3)
Subject: Re: background merge hit exception
Date: Thu, 26 Feb 2009 06:52:02 -0500
References: <dbd9700a0901021133i484fa012q314fb0f65701d0dc@mail.gmail.com>
 <9ac0c6aa0901021244n694853bbw838e9475c4b7d7c9@mail.gmail.com>
 <dbd9700a0901021247o45e2c65bq10b9ec22f8d58c68@mail.gmail.com>
 <c68e39170901021409s5149fe60j372c99a1367f18aa@mail.gmail.com>
 <9ac0c6aa0901021426o34d7dbd5h546c759c5afb8949@mail.gmail.com>
 <dbd9700a0901021539s43a62f0fj8daf0bfffb7b0086@mail.gmail.com>
 <9ac0c6aa0901030555p1ee7b9dbt7a308f6bc6103394@mail.gmail.com>
 <dbd9700a0901030804q5bbc4bcdob5b57f0b83ed8875@mail.gmail.com>
 <a206fb7e0902251729p4d4683dn3f65e7abcca2e047@mail.gmail.com>


The exception you hit during optimize indicates some corruption in the
stored fields file (_X.fdt).  Then, the exception hit during
CheckIndex was different -- the postings entry (_r0y.frq) for term
"desthost:wir" was supposed to have 6 entries but had 0.

Did "CheckIndex -fix" allow you to optimize successfully?  In which
case, both of these different corruptions were in the same segment
(_r0y) that CheckIndex removed.

Was this index fully created with 2.4.0 or a prior release?

Which exact JRE was used to create/write to this index?  And which one
was used for searching it?

How often do you hit this?  Is it repeatable?

It's remotely possible you have bad hardware (RAM, hard drive)

Most likely it's gone... but if somehow I could get access to that bad
segment, I could try to dig.

vivek sar wrote:

> Hi,
>
>  We ran into the same issue (corrupted index) using Lucene 2.4.0.
> There was no outage or system reboot - not sure how could it get
> corrupted. Here is the exception,
>
> Caused by: java.io.IOException: background merge hit exception:
> _io5:c66777491 _nh9:c10656736 _taq:c2021563 _s8m:c1421051
> _uh5:c2065961 _r0y:c1124653 _s4s:c2477731 _t6w:c4340938 _ucx:c8018451
> _xkb:c13842776 _xkd:c3394 _xke:c3379 _xkg:c1231 _xkh:c1252 _xkj:c1680
> _xkk:c1689 into _xkl [optimize]
>        at  
> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2258)
>        at  
> org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2194)
>        at  
> com 
> .packetmotion 
> .manager 
> .query.fulltext.index.Indexer.optimizeMasterIndex(Indexer.java:887)
>        at  
> com 
> .packetmotion 
> .manager 
> .query.fulltext.index.Indexer.createNewIndexPartition(Indexer.java: 
> 783)
>        at  
> com 
> .packetmotion 
> .manager.query.fulltext.index.Indexer.indexByDevice(Indexer.java:582)
>        at  
> com 
> .packetmotion 
> .manager.query.fulltext.index.Indexer.indexData(Indexer.java:440)
>        at com.packetmotion.manager.query.fulltext.index.Indexer$ 
> $FastClassByCGLIB$$97fb7e9b.invoke(<generated>)
>        at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:149)
>        at org.springframework.aop.framework.Cglib2AopProxy 
> $DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:628)
>        at com.packetmotion.manager.query.fulltext.index.Indexer$ 
> $EnhancerByCGLIB$$ebbe3914.indexData(<generated>)
>        at com.packetmotion.manager.query.fulltext.index.IndexerJob 
> $1.run(IndexerJob.java:38)
>        ... 8 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 51, Size: 26
>        at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>        at java.util.ArrayList.get(ArrayList.java:322)
>        at  
> org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:265)
>        at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java: 
> 185)
>        at  
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:729)
>        at  
> org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java: 
> 359)
>        at  
> org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:139)
>        at  
> org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:4226)
>        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java: 
> 3877)
>        at  
> org 
> .apache 
> .lucene 
> .index 
> .ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:205)
>        at org.apache.lucene.index.ConcurrentMergeScheduler 
> $MergeThread.run(ConcurrentMergeScheduler.java:260)
>
>
> I ran the checkIndex tool to fix the corrupted index. Here is its  
> output,
>
>
> Opening index @ /opt/ps_manager/apps/conf/index/MasterIndex
>
> Segments file=segments_1587 numSegments=18 version=FORMAT_HAS_PROX  
> [Lucene 2.4]
>  1 of 18: name=_io5 docCount=66777491
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=6,680.574
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [30 fields]
>    test: terms, freq, prox...OK [368415 terms; 1761057989 terms/docs
> pairs; 2095636359 tokens]
>    test: stored fields.......OK [66777491 total field count; avg 1
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  2 of 18: name=_nh9 docCount=10656736
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=1,058.869
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [30 fields]
>    test: terms, freq, prox...OK [118964 terms; 278176718 terms/docs
> pairs; 329825350 tokens]
>    test: stored fields.......OK [10656736 total field count; avg 1
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  3 of 18: name=_taq docCount=2021563
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=208.544
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [30 fields]
>    test: terms, freq, prox...OK [145494 terms; 54131697 terms/docs
> pairs; 65411701 tokens]
>    test: stored fields.......OK [2021563 total field count; avg 1
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  4 of 18: name=_s8m docCount=1421051
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=162.443
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [30 fields]
>    test: terms, freq, prox...OK [210276 terms; 42491363 terms/docs
> pairs; 53054214 tokens]
>    test: stored fields.......OK [1421051 total field count; avg 1
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  5 of 18: name=_uh5 docCount=2065961
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=229.394
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [30 fields]
>    test: terms, freq, prox...OK [126789 terms; 60416663 terms/docs
> pairs; 75120265 tokens]
>    test: stored fields.......OK [2065964 total field count; avg 1
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  6 of 18: name=_r0y docCount=1124653
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=115.792
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [26 fields]
>    test: terms, freq, prox...FAILED
>    WARNING: fixIndex() would remove reference to this segment; full  
> exception:
> java.lang.RuntimeException: term desthost:wir docFreq=6 != num docs
> seen 0 + num docs deleted 0
>        at  
> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:475)
>        at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:676)
>
>  7 of 18: name=_s4s docCount=2477731
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=230.698
>    no deletions
>    test: open reader.........PuTTYPuTTYOK
>    test: fields, norms.......OK [25 fields]
>    test: terms, freq, prox...OK [90327 terms; 60212096 terms/docs
> pairs; 72740383 tokens]
>    test: stored fields.......OK [2477731 total field count; avg 1
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  8 of 18: name=_t6w docCount=4340938
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=451.389
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [26 fields]
>    test: terms, freq, prox...OK [157817 terms; 121753990 terms/docs
> pairs; 151141568 tokens]
>    test: stored fields.......OK [4340938 total field count; avg 1
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  9 of 18: name=_ucx docCount=8018451
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=845.968
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [26 fields]
>    test: terms, freq, prox...OK [410057 terms; 227617455 terms/docs
> pairs; 283398975 tokens]
>    test: stored fields.......OK [8018451 total field count; avg 1
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  10 of 18: name=_xl9 docCount=13891933
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=1,408.881
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [26 fields]
>    test: terms, freq, prox...OK [535226 terms; 376394493 terms/docs
> pairs; 465003060 tokens]
>    test: stored fields.......OK [13891933 total field count; avg 1
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  11 of 18: name=_xlb docCount=3521
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=0.411
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [25 fields]
>    test: terms, freq, prox...OK [2070 terms; 108496 terms/docs pairs;
> 136365 tokens]
>    test: stored fields.......OK [3521 total field count; avg 1  
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  12 of 18: name=_xlc docCount=3529
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=0.412
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [22 fields]
>    test: terms, freq, prox...OK [2214 terms; 108633 terms/docs pairs;
> 136384 tokens]
>    test: stored fields.......OK [3529 total field count; avg 1  
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  13 of 18: name=_xle docCount=1401
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=0.188
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [21 fields]
>    test: terms, freq, prox...OK [1854 terms; 48703 terms/docs pairs;
> 62221 tokens]
>    test: stored fields.......OK [1401 total field count; avg 1  
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  14 of 18: name=_xlf docCount=1399
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=0.188
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [21 fields]
>    test: terms, freq, prox...OK [1763 terms; 48910 terms/docs pairs;
> 63035 tokens]
>    test: stored fields.......OK [1399 total field count; avg 1  
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  15 of 18: name=_xlh docCount=1727
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=0.24
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [21 fields]
>    test: terms, freq, prox...OK [2318 terms; 62596 terms/docs pairs;
> 80688 tokens]
>    test: stored fields.......OK [1727 total field count; avg 1  
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  16 of 18: name=_xli docCount=1716
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=0.237
>    no deletions
>    test: open reader.........OK
>     test: fields, norms.......OK [21 fields]
>    test: terms, freq, prox...OK [2264 terms; 61867 terms/docs pairs;
> 79497 tokens]
>    test: stored fields.......OK [1716 total field count; avg 1  
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  17 of 18: name=_xlk docCount=2921
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=0.364
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [25 fields]
>    test: terms, freq, prox...OK [2077 terms; 96536 terms/docs pairs;
> 123166 tokens]
>    test: stored fields.......OK [2921 total field count; avg 1  
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
>  18 of 18: name=_xll docCount=3876
>    compound=true
>    hasProx=true
>    numFiles=1
>    size (MB)=0.476
>    no deletions
>    test: open reader.........OK
>    test: fields, norms.......OK [21 fields]
>    test: terms, freq, prox...OK [2261 terms; 130104 terms/docs pairs;
> 166867 tokens]
>    test: stored fields.......OK [3876 total field count; avg 1  
> fields per doc]
>    test: term vectors........OK [0 total vector count; avg 0
> term/freq vector fields per doc]
>
> WARNING: 1 broken segments (containing 1124653 documents) detected
> WARNING: 1124653 documents will be lost
>
> NOTE: will write new segments file in 5 seconds; this will remove
> 1124653 docs from the index. THIS IS YOUR LAST CHANCE TO CTRL+C!
>  5...
>  4...
>  3...
>  2...
>  1...
> Writing...
> OK
> Wrote new segments file "segments_1588"
>
>
> Any ideas how would the index get corrupted?
>
> Thanks,
> -vivek
>
>
> On Sat, Jan 3, 2009 at 8:04 AM, Brian Whitman <brian@echonest.com>  
> wrote:
>>>
>>>
>>> It's very strange that CheckIndex -fix did not resolve the issue.   
>>> After
>>> fixing it, if you re-run CheckIndex on the index do you still see  
>>> that
>>> original one broken segment present?  CheckIndex should have removed
>>> reference to that one segment.
>>>
>>
>> I just ran it again, and it detected the same error and claimed to  
>> fix it. I
>> then shut down the solr server (I wasn't sure if this would be an  
>> issue),
>> ran it a third time (where it again found and claimed to fix the  
>> error),
>> then a fourth where it did not find any problems, and now the  
>> optimize()
>> call on the running server does not throw the merge exception.
>>
>>  Did this corruption happen only once?  (You mentioned hitting dups  
>> in
>> the past...
>>> but did you also see corruption too?)
>>
>>
>> Not that we know of, but it's very likely we never noticed. (The  
>> only reason
>> I discovered this was our commits were taking 20-40x longer on this  
>> index
>> than others)
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org