Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 27062 invoked from network); 5 May 2008 21:26:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 May 2008 21:26:48 -0000 Received: (qmail 47175 invoked by uid 500); 5 May 2008 21:26:42 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 47145 invoked by uid 500); 5 May 2008 21:26:42 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 47134 invoked by uid 99); 5 May 2008 21:26:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 May 2008 14:26:42 -0700 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [72.14.246.241] (HELO ag-out-0708.google.com) (72.14.246.241) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 May 2008 21:25:55 +0000 Received: by ag-out-0708.google.com with SMTP id 23so742718agd.11 for ; Mon, 05 May 2008 14:26:07 -0700 (PDT) Received: by 10.100.10.15 with SMTP id 15mr8400544anj.152.1210022767167; Mon, 05 May 2008 14:26:07 -0700 (PDT) Received: from ?10.17.4.4? ( [72.93.214.93]) by mx.google.com with ESMTPS id 27sm12270622wra.32.2008.05.05.14.26.06 (version=TLSv1/SSLv3 cipher=OTHER); Mon, 05 May 2008 14:26:06 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v753) In-Reply-To: <1210021246.11339.30.camel@m-laptop> References: <1210016037.11339.7.camel@m-laptop> <481F64A9.6040808@gmail.com> <1210017407.11339.16.camel@m-laptop> <02370A1C-5252-4EC2-87D1-C818287A950D@mikemccandless.com> <1210021246.11339.30.camel@m-laptop> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Michael McCandless Subject: Re: index corruption with latest lucene Date: Mon, 5 May 2008 17:26:54 -0400 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.753) X-Virus-Checked: Checked by ClamAV on apache.org Actually that stack trace looks like it's from trunk, not from 2.3.2 (pre)? OK, I think you said it's from "post 2.3 trunk". Another question: is autoCommit false or true? More responses below: Mark Miller wrote: > On Mon, 2008-05-05 at 16:32 -0400, Michael McCandless wrote: >> Hi Mark, >> >> Not good! >> >> Can you describe how this index was created? Did you use multiple >> threads on one IndexWriter? Multiple sessions of IndexWriter >> appending to the index? addIndexes*? Is the index copied from one >> place to another after being written and before being searched? > > Both sites were created by a single thread on a single IndexWriter. > Updates are done through multiple threads and one IndexWriter. No > addIndexes. Index was never copied, always same path. > >> >> If you run CheckIndex, what does it report? > > This was my next move...unfortunately, someone accidentally kicked > off a > complete reindex before I could do it. From what I can tell by the > stack > trace, its a per doc problem...I am guessing I could have printed the > ids of the problem docs and just reindex those? I have to deal with > this > at many other sites, so that may be my attack...I cannot reindex > everything to fix. It would be great to know if that workaround works (and indeed it's a per-doc issue). I'd also love to know how many docs are affected, when you hit this. If there's any way to zip up the index and send it to me, even just the files for the one segment that has the corrupted doc, that'd be great. >> >> Any prior exceptions on this index? > > Not that I can recall. One of the indexes was made months ago, prob > with > a 2.0 or 2.1 Lucene, the second was made with a post 2.2 Lucene. One > site was windows 2003, the other AIX. One site was only 30,000 > docs, the > other over 1 million. > >> >> Are your docs a variable schema (different fields)? > > Yes. Lots of different fields depending on the doc. > >> >> Mike > > Thanks Mike. I am currently trying to duplicate this. I can't go to > another site without testing some kind of fix. > >> >> Mark Miller wrote: >>> Yeah, its pretty close to 2.3.2, but I think from last week mabye. >>> >>> I finally have one of the stack traces (this comes on the tail >>> complete >>> laptop failure so I am scrambling here) >>> >>> java.lang.IndexOutOfBoundsException: Index: 97, Size: 43 >>> at java.util.ArrayList.RangeCheck(ArrayList.java:572) >>> at java.util.ArrayList.get(ArrayList.java:347) >>> at org.apache.lucene.index.FieldInfos.fieldInfo >>> (FieldInfos.java:260) >>> at org.apache.lucene.index.FieldsReader.doc >>> (FieldsReader.java:184) >>> at org.apache.lucene.index.SegmentReader.document >>> (SegmentReader.java:670) >>> at org.apache.lucene.index.MultiSegmentReader.document >>> (MultiSegmentReader.java:257) >>> at org.apache.lucene.search.IndexSearcher.doc >>> (IndexSearcher.java:97) >>> >>> On Mon, 2008-05-05 at 14:48 -0500, crspan wrote: >>>> coincidence or it is from 2.3.2 ? >>>> >>>> env: >>>> lucene 2.3.2 >>>> jdk1.6.0_06 & jdk1.5.0_15 >>>> >>>> >>>> QueryString: >>>> illeg^30.820824 technolog^22.290413 transfer^33.307804 >>>> Error: java.lang.ArrayIndexOutOfBoundsException: >>>> 132704java.lang.ArrayIndexOutOfBoundsException: 132704 >>>> at >>>> org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor >>>> (BooleanScorer2.java:55) >>>> at org.apache.lucene.search.BooleanScorer2.score >>>> (BooleanScorer2.java:358) >>>> at org.apache.lucene.search.BooleanScorer2.score >>>> (BooleanScorer2.java:320) >>>> at org.apache.lucene.search.IndexSearcher.search >>>> (IndexSearcher.java:146) >>>> at org.apache.lucene.search.IndexSearcher.search >>>> (IndexSearcher.java:113) >>>> at org.apache.lucene.search.Searcher.search(Searcher.java:132) >>>> at >>>> org.cr.search.TrecQueryRelevanceFeedback.main >>>> (TrecQueryRelevanceFeedback.java:776) >>>> >>>> >>>> QueryString: >>>> oceanograph^68.48028 vessel^43.191563 >>>> Error: >>>> java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBo >>>> un >>>> dsException >>>> at java.lang.System.arraycopy(Native Method) >>>> at >>>> org.apache.lucene.index.TermVectorsReader.readTermVector >>>> (TermVectorsReader.java:353) >>>> at >>>> org.apache.lucene.index.TermVectorsReader.readTermVectors >>>> (TermVectorsReader.java:287) >>>> at org.apache.lucene.index.TermVectorsReader.get >>>> (TermVectorsReader.java:232) >>>> at >>>> org.apache.lucene.index.SegmentReader.getTermFreqVectors >>>> (SegmentReader.java:981) >>>> at org.cr.rf.RelevanceFeedback.RelFeedbackWeight >>>> (RelevanceFeedback.java:134) >>>> at >>>> org.cr.search.TrecQueryRelevanceFeedback.main >>>> (TrecQueryRelevanceFeedback.java:781) >>>> >>>> >>>> >>>> >>>> Mark Miller wrote: >>>>> Any recent changes that would expose index corruption? >>>>> >>>>> I am getting two new errors when trying to search: >>>>> >>>>> nullpointer fieldsreaders line 260 >>>>> >>>>> indexoutofbounds on fieldinfo line 185 >>>>> >>>>> I am kind of screwed, because reindexing fixes this, but I cant >>>>> reindex! >>>>> >>>>> Any ideas? >>>>> >>>>> >>>>> ------------------------------------------------------------------ >>>>> -- >>>>> - >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>> >>>>> >>>> >>>> >>>> ------------------------------------------------------------------- >>>> -- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>> >>> >>> -------------------------------------------------------------------- >>> - >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org