lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Lea" <ian....@gmail.com>
Subject Re: CorruptIndexException with some versions of java
Date Tue, 18 Mar 2008 13:53:15 GMT
Documents are biblio records.  All have title, author etc. stored,
some have a few extra fields as well.  Typically around 25 fields per
doc.  The index is created with compound format, everything else as
default.

I've rerun the job until failure.  Different numbers this time, but
basically the same exception. In the failing pass it was trying to
load 100K docs.

Exception in thread "Thread-1"
org.apache.lucene.index.MergePolicy$MergeException:
org.apache.lucene.index.CorruptIndexException: doc counts differ for
segment _cb: fieldsReader shows 71663 but segmentInfo shows 71664
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:271)
Caused by: org.apache.lucene.index.CorruptIndexException: doc counts
differ for segment _cb: fieldsReader shows 71663 but segmentInfo shows
 71664
        at org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:313)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
        at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:221)
        at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3099)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2834)
        at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:240)

As mentioned, the index is loaded in chunks - the infostream from the
failing pass is attached. All infostreams from all 19 odd runs leading
up to the failure available as well if that would help.


Running with -ea doesn't seem to have made any difference.


--
Ian.


On Tue, Mar 18, 2008 at 12:09 PM, Michael McCandless
<lucene@mikemccandless.com> wrote:
>
>  Can you call IndexWriter.setInfoStream(...) and get the error to
>  happen and post back the resulting output?  And, turn on assertions
>  (java -ea) since that may catch the issue sooner.
>
>  Can you describe you are setting up IndexWriter (autoCommit,
>  compound, etc.), and what your documents are like?  Do your documents
>  have a fixed schema (same fields every time), or it varies such that
>  some documents have no stored fields and some do?
>
>  Mike
>
>
>
>  Ian Lea wrote:
>
>  > Hi
>  >
>  >
>  > When bulk loading into a new index I'm seeing this exception
>  >
>  > Exception in thread "Thread-1"
>  > org.apache.lucene.index.MergePolicy$MergeException:
>  > org.apache.lucene.index.CorruptIndexException: doc counts differ for
>  > segment _4l: fieldsReader shows 67861 but segmentInfo shows 67862
>  >       at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run
>  > (ConcurrentMergeScheduler.java:271)
>  > Caused by: org.apache.lucene.index.CorruptIndexException: doc counts
>  > differ for segment _4l: fieldsReader shows 67861 but segmentInfo shows
>  > 67862
>  >       at org.apache.lucene.index.SegmentReader.initialize
>  > (SegmentReader.java:313)
>  >       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
>  >       at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:221)
>  >       at org.apache.lucene.index.IndexWriter.mergeMiddle
>  > (IndexWriter.java:3093)
>  >       at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2834)
>  >       at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run
>  > (ConcurrentMergeScheduler.java:240)
>  >
>  > when use java version 1.6.0_05-b13 or 1.6.0_04-b12 on linux, with
>  > lucene 2.3.0 or 2.3.1 or lucene-core-2.3-SNAPSHOT from yesterday.
>  >
>  > With java version 1.6.0_03-b05 things work fine.
>  >
>  > The exception happens a few hundred thousand documents into the load.
>  >
>  > A different program updating a different index with different data on
>  > a different server gave a similar error on version 1.6.0_05-b13 and
>  > lucene 2.3.0.
>  >
>  >
>  > Any ideas?  Is this maybe a known issue or am I missing something
>  > obvious?
>  >
>  >
>  >
>  > --
>  > Ian.
>  >
>  > ---------------------------------------------------------------------
>  > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  > For additional commands, e-mail: java-user-help@lucene.apache.org
>  >
>
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


Mime
View raw message