lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopikrishnan Subramani" <gopi.subram...@gmail.com>
Subject Re: index corruption with latest lucene
Date Tue, 06 May 2008 09:29:48 GMT
Thanks Mike. Sorry, I should have mentioned that I'm using 1.6.0_04. I
happened to look at the thread a while ago and used -Xbatch but that didn't
help which made me think may be it's a different issue. I'll try with -Xint
before downgrading to 1.6.0_03 to be doubly sure.

-Gopi


On 5/6/08, Michael McCandless <lucene@mikemccandless.com> wrote:
>
>
> Are you using JRE 1.6.0_04 or 1.6.0_05?
>
> This sounds exactly the same as this:
>
>    http://www.gossamer-threads.com/lists/lucene/java-user/59650
>
> If it is the same issue, which seems to be a bug in the hotspot compiler,
> downgrading to JRE 1.6.0_03, or running Java with -Xbatch (forces up-front
> compilation) or -Xint (disables compilation) works around it.
>
> Can you test either of these and report back?  Thanks.
>
> Mike
>
> Gopikrishnan Subramani wrote:
>
> > [ Sorry if I'm hijacking this thread, if you feel this error is
> > unrelated to
> > this thread, I'll move this to a separate thread. ]
> >
> > Even after upgrading to 2.3.1 I'm running into index corruption
> > problems.
> > I'm posting below the exception that is generated while searching. The
> > stack
> > trace looks like,
> >
> >
> > org.apache.lucene.index.CorruptIndexException: doc counts differ for
> > segment
> > _kk: fieldsReader shows 72670 but segmentInfo shows 72671
> >        at
> > org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:313)
> >        at
> > org.apache.lucene.index.SegmentReader.get(SegmentReader.java:262)
> >        at
> > org.apache.lucene.index.SegmentReader.get(SegmentReader.java:230)
> >        at
> >
> > org.apache.lucene.index.DirectoryIndexReader$1.doBody(DirectoryIndexReader.java:73)
> >        at
> >
> > org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:636)
> >        at
> >
> > org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:63)
> >        at org.apache.lucene.index.IndexReader.open(IndexReader.java:209)
> >        at org.apache.lucene.index.IndexReader.open(IndexReader.java:173)
> >        at
> > org.apache.lucene.search.IndexSearcher.<init>(IndexSearcher.java:48)
> >
> >
> > About the setup: all documents in the these indexes have the same set of
> > fields, but some fields are not added if the value is null. We have over
> > 500
> > indexes and they are indexed incrementally on a daily basis. The index
> > is
> > updated in place with autocommit turned on. A single thread writes to
> > the
> > index and once all the documents are updated, the index is commited.
> > About 5
> > indexes are getting corrupted per week on an average and a full index
> > fixes
> > the problem. This is proving to be a lot of pain and any help in
> > identifying
> > the problem is much appreicated.
> >
> > thanks,
> > Gopi
> >
> > On 5/6/08, Mark Miller <markrmiller@gmail.com> wrote:
> >
> > >
> > > I am getting even more confused. I luckily found a copy of one of the
> > > corrupted test indices that i had made on 4/28/08...lucky as its the
> > > only one I have ever made :) It doesn't have the problem. This is very
> > > interesting to me, because the other site that has the problem has
> > > been
> > > in action for months now. Both were running with my previous version
> > > of
> > > Lucene, which was a trunk build from around when 2.3 was released I
> > > think. Just seems odd that the test index was corrupted so recently.
> > >
> > >
> > > So I am a bit stuck...its probably my own problem though, so unless
> > > someone else sees it, Ill just report back if/when I find out more.
> > >
> > > - Mark
> > >
> > >
> > > On Mon, 2008-05-05 at 18:07 -0400, Michael McCandless wrote:
> > >
> > > > Mark,
> > > >
> > > > Which exact version of the JRE are you using?
> > > >
> > > > Mike
> > > >
> > > > Mark Miller wrote:
> > > >
> > > > > On Mon, 2008-05-05 at 17:26 -0400, Michael McCandless wrote:
> > > > >
> > > > > > Actually that stack trace looks like it's from trunk, not from
> > > > > > 2.3.2
> > > > > > (pre)?  OK, I think you said it's from "post 2.3 trunk".
> > > > > >
> > > > >
> > > > > Right...the Lucene that showed the problem was build from a trunk
> > > > > grab
> > > > > late last week. One of the problem indexes was built with a 2.0 or
> > > > > 2.1
> > > > > and the other was built with a post 2.3 trunk (but weeks (prob
> > > > > months)
> > > > > before the one i grabbed late last week :) )
> > > > >
> > > > >
> > > > > > Another question: is autoCommit false or true?
> > > > > >
> > > > > false
> > > > >
> > > > >
> > > > >
> > > > > If I can get you an affected index I will.
> > > > >
> > > > > - mark
> > > > >
> > > > >
> > > > >
> > > > > > More responses below:
> > > > > >
> > > > > > Mark Miller wrote:
> > > > > >
> > > > > > > On Mon, 2008-05-05 at 16:32 -0400, Michael McCandless wrote:
> > > > > > >
> > > > > > > > Hi Mark,
> > > > > > > >
> > > > > > > > Not good!
> > > > > > > >
> > > > > > > > Can you describe how this index was created?  Did
you use
> > > > > > > > multiple
> > > > > > > > threads on one IndexWriter?  Multiple sessions of
> > > > > > > > IndexWriter
> > > > > > > > appending to the index?  addIndexes*?  Is the index
copied
> > > > > > > > from one
> > > > > > > > place to another after being written and before being
> > > > > > > > searched?
> > > > > > > >
> > > > > > >
> > > > > > > Both sites were created by a single thread on a single
> > > > > > > IndexWriter.
> > > > > > > Updates are done through multiple threads and one IndexWriter.
> > > > > > > No
> > > > > > > addIndexes. Index was never copied, always same path.
> > > > > > >
> > > > > > >
> > > > > > > > If you run CheckIndex, what does it report?
> > > > > > > >
> > > > > > >
> > > > > > > This was my next move...unfortunately, someone accidentally
> > > > > > > kicked
> > > > > > > off a
> > > > > > > complete reindex before I could do it. From what I can
tell by
> > > > > > > the
> > > > > > > stack
> > > > > > > trace, its a per doc problem...I am guessing I could have
> > > > > > > printed the
> > > > > > > ids of the problem docs and just reindex those? I have
to deal
> > > > > > > with
> > > > > > > this
> > > > > > > at many other sites, so that may be my attack...I cannot
> > > > > > > reindex
> > > > > > > everything to fix.
> > > > > > >
> > > > > >
> > > > > > It would be great to know if that workaround works (and indeed
> > > > > > it's a
> > > > > > per-doc issue).  I'd also love to know how many docs are
> > > > > > affected,
> > > > > > when you hit this.
> > > > > >
> > > > > > If there's any way to zip up the index and send it to me, even
> > > > > > just
> > > > > > the files for the one segment that has the corrupted doc, that'd
> > > > > > be
> > > > > > great.
> > > > > >
> > > > > >
> > > > > > > > Any prior exceptions on this index?
> > > > > > > >
> > > > > > >
> > > > > > > Not that I can recall. One of the indexes was made months
ago,
> > > > > > > prob
> > > > > > > with
> > > > > > > a 2.0 or 2.1 Lucene, the second was made with a post 2.2
> > > > > > > Lucene. One
> > > > > > > site was windows 2003, the other AIX. One site was only
30,000
> > > > > > > docs, the
> > > > > > > other over 1 million.
> > > > > > >
> > > > > > >
> > > > > > > > Are your docs a variable schema (different fields)?
> > > > > > > >
> > > > > > >
> > > > > > > Yes. Lots of different fields depending on the doc.
> > > > > > >
> > > > > > >
> > > > > > > > Mike
> > > > > > > >
> > > > > > >
> > > > > > > Thanks Mike. I am currently trying to duplicate this. I
can't
> > > > > > > go to
> > > > > > > another site without testing some kind of fix.
> > > > > > >
> > > > > > >
> > > > > > > > Mark Miller wrote:
> > > > > > > >
> > > > > > > > > Yeah, its pretty close to 2.3.2, but I think
from last
> > > > > > > > > week mabye.
> > > > > > > > >
> > > > > > > > > I finally have one of the stack traces (this
comes on the
> > > > > > > > > tail
> > > > > > > > > complete
> > > > > > > > > laptop failure so I am scrambling here)
> > > > > > > > >
> > > > > > > > > java.lang.IndexOutOfBoundsException: Index: 97,
Size: 43
> > > > > > > > >        at
> > > > > > > > > java.util.ArrayList.RangeCheck(ArrayList.java:572)
> > > > > > > > >        at java.util.ArrayList.get(ArrayList.java:347)
> > > > > > > > >        at org.apache.lucene.index.FieldInfos.fieldInfo
> > > > > > > > > (FieldInfos.java:260)
> > > > > > > > >        at org.apache.lucene.index.FieldsReader.doc
> > > > > > > > > (FieldsReader.java:184)
> > > > > > > > >        at org.apache.lucene.index.SegmentReader.document
> > > > > > > > > (SegmentReader.java:670)
> > > > > > > > >        at
> > > > > > > > > org.apache.lucene.index.MultiSegmentReader.document
> > > > > > > > > (MultiSegmentReader.java:257)
> > > > > > > > >        at org.apache.lucene.search.IndexSearcher.doc
> > > > > > > > > (IndexSearcher.java:97)
> > > > > > > > >
> > > > > > > > > On Mon, 2008-05-05 at 14:48 -0500, crspan wrote:
> > > > > > > > >
> > > > > > > > > > coincidence or it is from 2.3.2 ?
> > > > > > > > > >
> > > > > > > > > > env:
> > > > > > > > > > lucene 2.3.2
> > > > > > > > > > jdk1.6.0_06 & jdk1.5.0_15
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > QueryString:
> > > > > > > > > > illeg^30.820824 technolog^22.290413 transfer^33.307804
> > > > > > > > > > Error: java.lang.ArrayIndexOutOfBoundsException:
> > > > > > > > > > 132704java.lang.ArrayIndexOutOfBoundsException:
132704
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > > > org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor
> > > > > > > > > > (BooleanScorer2.java:55)
> > > > > > > > > > at org.apache.lucene.search.BooleanScorer2.score
> > > > > > > > > > (BooleanScorer2.java:358)
> > > > > > > > > > at org.apache.lucene.search.BooleanScorer2.score
> > > > > > > > > > (BooleanScorer2.java:320)
> > > > > > > > > > at org.apache.lucene.search.IndexSearcher.search
> > > > > > > > > > (IndexSearcher.java:146)
> > > > > > > > > > at org.apache.lucene.search.IndexSearcher.search
> > > > > > > > > > (IndexSearcher.java:113)
> > > > > > > > > > at
> > > > > > > > > > org.apache.lucene.search.Searcher.search(Searcher.java:132)
> > > > > > > > > > at
> > > > > > > > > > org.cr.search.TrecQueryRelevanceFeedback.main
> > > > > > > > > > (TrecQueryRelevanceFeedback.java:776)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > QueryString:
> > > > > > > > > > oceanograph^68.48028 vessel^43.191563
> > > > > > > > > > Error:
> > > > > > > > > >
> > > > > > > > > > java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOf
> > > > > > > > > > Bo
> > > > > > > > > > un
> > > > > > > > > > dsException
> > > > > > > > > > at java.lang.System.arraycopy(Native Method)
> > > > > > > > > > at
> > > > > > > > > > org.apache.lucene.index.TermVectorsReader.readTermVector
> > > > > > > > > > (TermVectorsReader.java:353)
> > > > > > > > > > at
> > > > > > > > > >
> > > > > > > > > > org.apache.lucene.index.TermVectorsReader.readTermVectors
> > > > > > > > > > (TermVectorsReader.java:287)
> > > > > > > > > > at org.apache.lucene.index.TermVectorsReader.get
> > > > > > > > > > (TermVectorsReader.java:232)
> > > > > > > > > > at
> > > > > > > > > > org.apache.lucene.index.SegmentReader.getTermFreqVectors
> > > > > > > > > > (SegmentReader.java:981)
> > > > > > > > > > at org.cr.rf.RelevanceFeedback.RelFeedbackWeight
> > > > > > > > > > (RelevanceFeedback.java:134)
> > > > > > > > > > at
> > > > > > > > > > org.cr.search.TrecQueryRelevanceFeedback.main
> > > > > > > > > > (TrecQueryRelevanceFeedback.java:781)
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Mark Miller wrote:
> > > > > > > > > >
> > > > > > > > > > > Any recent changes that would expose
index corruption?
> > > > > > > > > > >
> > > > > > > > > > > I am getting two new errors when trying
to search:
> > > > > > > > > > >
> > > > > > > > > > > nullpointer fieldsreaders line 260
> > > > > > > > > > >
> > > > > > > > > > > indexoutofbounds on fieldinfo line
185
> > > > > > > > > > >
> > > > > > > > > > > I am kind of screwed, because reindexing
fixes this,
> > > > > > > > > > > but I cant
> > > > > > > > > > > reindex!
> > > > > > > > > > >
> > > > > > > > > > > Any ideas?
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > ----------------------------------------------------------------
> > > > > > > > > > > --
> > > > > > > > > > > --
> > > > > > > > > > > -
> > > > > > > > > > > To unsubscribe, e-mail: java-user-
> > > > > > > > > > > unsubscribe@lucene.apache.org
> > > > > > > > > > > For additional commands, e-mail: java-user-
> > > > > > > > > > > help@lucene.apache.org
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > -----------------------------------------------------------------
> > > > > > > > > > --
> > > > > > > > > > --
> > > > > > > > > > To unsubscribe, e-mail: java-user-
> > > > > > > > > > unsubscribe@lucene.apache.org
> > > > > > > > > > For additional commands, e-mail: java-user-
> > > > > > > > > > help@lucene.apache.org
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > ------------------------------------------------------------------
> > > > > > > > > --
> > > > > > > > > -
> > > > > > > > > To unsubscribe, e-mail:
> > > > > > > > > java-user-unsubscribe@lucene.apache.org
> > > > > > > > > For additional commands, e-mail: java-user-
> > > > > > > > > help@lucene.apache.org
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > -------------------------------------------------------------------
> > > > > > > > --
> > > > > > > > To unsubscribe, e-mail:
> > > > > > > > java-user-unsubscribe@lucene.apache.org
> > > > > > > > For additional commands, e-mail: java-user-
> > > > > > > > help@lucene.apache.org
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --------------------------------------------------------------------
> > > > > > > -
> > > > > > > To unsubscribe, e-mail:
> > > > > > > java-user-unsubscribe@lucene.apache.org
> > > > > > > For additional commands, e-mail:
> > > > > > > java-user-help@lucene.apache.org
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > > For additional commands, e-mail:
> > > > > > java-user-help@lucene.apache.org
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >
> > > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message