Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 30619 invoked from network); 5 May 2008 21:35:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 May 2008 21:35:11 -0000 Received: (qmail 65320 invoked by uid 500); 5 May 2008 21:35:07 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 65021 invoked by uid 500); 5 May 2008 21:35:06 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 65010 invoked by uid 99); 5 May 2008 21:35:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 May 2008 14:35:06 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of buschmic@gmail.com designates 64.233.166.183 as permitted sender) Received: from [64.233.166.183] (HELO py-out-1112.google.com) (64.233.166.183) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 May 2008 21:34:20 +0000 Received: by py-out-1112.google.com with SMTP id z74so1719238pyg.9 for ; Mon, 05 May 2008 14:34:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; bh=krpPZyjmFiSJQ7yx6gFd8gZjDZjitfw1qOy5ME5v0Iw=; b=s90LqylqlBfye86VBjol6gqVJrK+ABho7mYZcSik0RtIrQ1GqVKzCLkY5fVzdPM7MIS9nxPMn25FEFDRsqDLHjoP/aM5YKy5uIrhnn6GaypiHjxK7uTbVrNnihtDA1U2suTpJVQwbfyVcOa0AOt1r9SRWOoaHnLdoHt/jD4bNTo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=ooYjl1pu7EUDEILt+FBzoo8Pl6+NaHXNo/QJvpAmEQSBjMU8cabqT3ydtPO7aods5GCnsfoASBVQjACWoQHDJDBDfxkgMH73WaZMiXlkiVU/tLRO2UvmCnrLTrD69Lcq8x1wekAF2RGDT6smvwWyxhwAEF7p7VT71zVQB9qpeGA= Received: by 10.35.49.11 with SMTP id b11mr12230724pyk.40.1210023273471; Mon, 05 May 2008 14:34:33 -0700 (PDT) Received: from ?9.30.38.211? ( [129.42.184.35]) by mx.google.com with ESMTPS id w67sm21257601pyg.20.2008.05.05.14.34.31 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 05 May 2008 14:34:32 -0700 (PDT) Message-ID: <481F7CF4.9020402@gmail.com> Date: Mon, 05 May 2008 14:32:36 -0700 From: Michael Busch User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: index corruption with latest lucene References: <1210016037.11339.7.camel@m-laptop> <481F64A9.6040808@gmail.com> <1210017407.11339.16.camel@m-laptop> <02370A1C-5252-4EC2-87D1-C818287A950D@mikemccandless.com> <1210021246.11339.30.camel@m-laptop> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org If that is the case then I will go ahead and publish the 2.3.2 release? Have you seen this on 2.3.x, Mark? -Michael Michael McCandless wrote: > > Actually that stack trace looks like it's from trunk, not from > 2.3.2(pre)? OK, I think you said it's from "post 2.3 trunk". > > Another question: is autoCommit false or true? > > More responses below: > > Mark Miller wrote: >> On Mon, 2008-05-05 at 16:32 -0400, Michael McCandless wrote: >>> Hi Mark, >>> >>> Not good! >>> >>> Can you describe how this index was created? Did you use multiple >>> threads on one IndexWriter? Multiple sessions of IndexWriter >>> appending to the index? addIndexes*? Is the index copied from one >>> place to another after being written and before being searched? >> >> Both sites were created by a single thread on a single IndexWriter. >> Updates are done through multiple threads and one IndexWriter. No >> addIndexes. Index was never copied, always same path. >> >>> >>> If you run CheckIndex, what does it report? >> >> This was my next move...unfortunately, someone accidentally kicked off a >> complete reindex before I could do it. From what I can tell by the stack >> trace, its a per doc problem...I am guessing I could have printed the >> ids of the problem docs and just reindex those? I have to deal with this >> at many other sites, so that may be my attack...I cannot reindex >> everything to fix. > > It would be great to know if that workaround works (and indeed it's a > per-doc issue). I'd also love to know how many docs are affected, when > you hit this. > > If there's any way to zip up the index and send it to me, even just the > files for the one segment that has the corrupted doc, that'd be great. > >>> >>> Any prior exceptions on this index? >> >> Not that I can recall. One of the indexes was made months ago, prob with >> a 2.0 or 2.1 Lucene, the second was made with a post 2.2 Lucene. One >> site was windows 2003, the other AIX. One site was only 30,000 docs, the >> other over 1 million. >> >>> >>> Are your docs a variable schema (different fields)? >> >> Yes. Lots of different fields depending on the doc. >> >>> >>> Mike >> >> Thanks Mike. I am currently trying to duplicate this. I can't go to >> another site without testing some kind of fix. >> >>> >>> Mark Miller wrote: >>>> Yeah, its pretty close to 2.3.2, but I think from last week mabye. >>>> >>>> I finally have one of the stack traces (this comes on the tail >>>> complete >>>> laptop failure so I am scrambling here) >>>> >>>> java.lang.IndexOutOfBoundsException: Index: 97, Size: 43 >>>> at java.util.ArrayList.RangeCheck(ArrayList.java:572) >>>> at java.util.ArrayList.get(ArrayList.java:347) >>>> at org.apache.lucene.index.FieldInfos.fieldInfo >>>> (FieldInfos.java:260) >>>> at org.apache.lucene.index.FieldsReader.doc >>>> (FieldsReader.java:184) >>>> at org.apache.lucene.index.SegmentReader.document >>>> (SegmentReader.java:670) >>>> at org.apache.lucene.index.MultiSegmentReader.document >>>> (MultiSegmentReader.java:257) >>>> at org.apache.lucene.search.IndexSearcher.doc >>>> (IndexSearcher.java:97) >>>> >>>> On Mon, 2008-05-05 at 14:48 -0500, crspan wrote: >>>>> coincidence or it is from 2.3.2 ? >>>>> >>>>> env: >>>>> lucene 2.3.2 >>>>> jdk1.6.0_06 & jdk1.5.0_15 >>>>> >>>>> >>>>> QueryString: >>>>> illeg^30.820824 technolog^22.290413 transfer^33.307804 >>>>> Error: java.lang.ArrayIndexOutOfBoundsException: >>>>> 132704java.lang.ArrayIndexOutOfBoundsException: 132704 >>>>> at >>>>> org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor >>>>> (BooleanScorer2.java:55) >>>>> at org.apache.lucene.search.BooleanScorer2.score >>>>> (BooleanScorer2.java:358) >>>>> at org.apache.lucene.search.BooleanScorer2.score >>>>> (BooleanScorer2.java:320) >>>>> at org.apache.lucene.search.IndexSearcher.search >>>>> (IndexSearcher.java:146) >>>>> at org.apache.lucene.search.IndexSearcher.search >>>>> (IndexSearcher.java:113) >>>>> at org.apache.lucene.search.Searcher.search(Searcher.java:132) >>>>> at >>>>> org.cr.search.TrecQueryRelevanceFeedback.main >>>>> (TrecQueryRelevanceFeedback.java:776) >>>>> >>>>> >>>>> QueryString: >>>>> oceanograph^68.48028 vessel^43.191563 >>>>> Error: >>>>> java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBoun >>>>> dsException >>>>> at java.lang.System.arraycopy(Native Method) >>>>> at >>>>> org.apache.lucene.index.TermVectorsReader.readTermVector >>>>> (TermVectorsReader.java:353) >>>>> at >>>>> org.apache.lucene.index.TermVectorsReader.readTermVectors >>>>> (TermVectorsReader.java:287) >>>>> at org.apache.lucene.index.TermVectorsReader.get >>>>> (TermVectorsReader.java:232) >>>>> at >>>>> org.apache.lucene.index.SegmentReader.getTermFreqVectors >>>>> (SegmentReader.java:981) >>>>> at org.cr.rf.RelevanceFeedback.RelFeedbackWeight >>>>> (RelevanceFeedback.java:134) >>>>> at >>>>> org.cr.search.TrecQueryRelevanceFeedback.main >>>>> (TrecQueryRelevanceFeedback.java:781) >>>>> >>>>> >>>>> >>>>> >>>>> Mark Miller wrote: >>>>>> Any recent changes that would expose index corruption? >>>>>> >>>>>> I am getting two new errors when trying to search: >>>>>> >>>>>> nullpointer fieldsreaders line 260 >>>>>> >>>>>> indexoutofbounds on fieldinfo line 185 >>>>>> >>>>>> I am kind of screwed, because reindexing fixes this, but I cant >>>>>> reindex! >>>>>> >>>>>> Any ideas? >>>>>> >>>>>> >>>>>> -------------------------------------------------------------------- >>>>>> - >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>> >>>>>> >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org