Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 43832 invoked from network); 5 May 2008 22:10:08 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 5 May 2008 22:10:08 -0000 Received: (qmail 95867 invoked by uid 500); 5 May 2008 22:10:03 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 95837 invoked by uid 500); 5 May 2008 22:10:02 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 95826 invoked by uid 99); 5 May 2008 22:10:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 May 2008 15:10:02 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of buschmic@gmail.com designates 64.233.166.183 as permitted sender) Received: from [64.233.166.183] (HELO py-out-1112.google.com) (64.233.166.183) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 May 2008 22:09:17 +0000 Received: by py-out-1112.google.com with SMTP id z74so1738632pyg.9 for ; Mon, 05 May 2008 15:09:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; bh=qJFhLsUXj3lva/V8uY0NnhGqtM0S9ruWow4ShvWCRwU=; b=sbd1P2zZayaHAu9DfoIDQSbfoqkfUuiJzaFw3Dz0gkxNMkRn6LVXxr/vR8xFHlCbDuMrg5J7G3vsYAQmfS/VOrQdlPpBTSUxN/aVHj1rd/gGFlXzUhVC+g8RvGb5yrU4S/lb0JCQsbTVMmuJrjyWcnL97O8YV2aIYtghspfhkt8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=Z0lzXBJzvApHThRkpwN5C+N0807f2FMyJvX4YAfG/WcTg62LkKCPi8g6M4HjGODn6RC6uJJljaar0JLeFiVdnmfZGqR32Iih/8nBu8TngAJ4bwGKfyen3fe94ZWoa/c5IjVmYISyYvmD4MBjW0MQc4+NtoE8qYnIf//S8m7uzWM= Received: by 10.35.93.1 with SMTP id v1mr12283767pyl.57.1210025370905; Mon, 05 May 2008 15:09:30 -0700 (PDT) Received: from ?9.30.38.211? ( [129.42.184.35]) by mx.google.com with ESMTPS id f55sm21494374pyh.28.2008.05.05.15.09.29 (version=TLSv1/SSLv3 cipher=RC4-MD5); Mon, 05 May 2008 15:09:29 -0700 (PDT) Message-ID: <481F8525.1010604@gmail.com> Date: Mon, 05 May 2008 15:07:33 -0700 From: Michael Busch User-Agent: Thunderbird 2.0.0.14 (Windows/20080421) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: index corruption with latest lucene References: <1210016037.11339.7.camel@m-laptop> <481F64A9.6040808@gmail.com> <1210017407.11339.16.camel@m-laptop> <02370A1C-5252-4EC2-87D1-C818287A950D@mikemccandless.com> <1210021246.11339.30.camel@m-laptop> <481F7CF4.9020402@gmail.com> <1210024964.11339.40.camel@m-laptop> In-Reply-To: <1210024964.11339.40.camel@m-laptop> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Yeah, it's probably confusing, because we currently commit patches to two branches: the trunk (/repos/asf/lucene/java/trunk) and the 2.3 branch (/repos/asf/lucene/java/branches/lucene_2_3). So if you checked out from the trunk, then this is not the 2.3.2 version. The 2.3.2 release candidate is from the 2.3 branch, revision 652650. -Michael Mark Miller wrote: > Man, I have even confused myself on these versions at this point. Let me > start over. > > I am having the problem with a version of lucene that was the trunk late > last week. Which pretty much means 2.3.2. > > I'd hate to hold up the release if the problem was only me though. I am > trying to work through it as fast I can. I just have to find another > index somewhere with the problem. Its just difficult because the indexes > are very large and on remote live sites. I am hoping I can find another > old test one with the problem or make one. The two installs that I have > detected the problem were rebuilt, one inadvertently. > > - Mark > > On Mon, 2008-05-05 at 14:32 -0700, Michael Busch wrote: >> If that is the case then I will go ahead and publish the 2.3.2 release? >> Have you seen this on 2.3.x, Mark? >> >> -Michael >> >> Michael McCandless wrote: >>> Actually that stack trace looks like it's from trunk, not from >>> 2.3.2(pre)? OK, I think you said it's from "post 2.3 trunk". >>> >>> Another question: is autoCommit false or true? >>> >>> More responses below: >>> >>> Mark Miller wrote: >>>> On Mon, 2008-05-05 at 16:32 -0400, Michael McCandless wrote: >>>>> Hi Mark, >>>>> >>>>> Not good! >>>>> >>>>> Can you describe how this index was created? Did you use multiple >>>>> threads on one IndexWriter? Multiple sessions of IndexWriter >>>>> appending to the index? addIndexes*? Is the index copied from one >>>>> place to another after being written and before being searched? >>>> Both sites were created by a single thread on a single IndexWriter. >>>> Updates are done through multiple threads and one IndexWriter. No >>>> addIndexes. Index was never copied, always same path. >>>> >>>>> If you run CheckIndex, what does it report? >>>> This was my next move...unfortunately, someone accidentally kicked off a >>>> complete reindex before I could do it. From what I can tell by the stack >>>> trace, its a per doc problem...I am guessing I could have printed the >>>> ids of the problem docs and just reindex those? I have to deal with this >>>> at many other sites, so that may be my attack...I cannot reindex >>>> everything to fix. >>> It would be great to know if that workaround works (and indeed it's a >>> per-doc issue). I'd also love to know how many docs are affected, when >>> you hit this. >>> >>> If there's any way to zip up the index and send it to me, even just the >>> files for the one segment that has the corrupted doc, that'd be great. >>> >>>>> Any prior exceptions on this index? >>>> Not that I can recall. One of the indexes was made months ago, prob with >>>> a 2.0 or 2.1 Lucene, the second was made with a post 2.2 Lucene. One >>>> site was windows 2003, the other AIX. One site was only 30,000 docs, the >>>> other over 1 million. >>>> >>>>> Are your docs a variable schema (different fields)? >>>> Yes. Lots of different fields depending on the doc. >>>> >>>>> Mike >>>> Thanks Mike. I am currently trying to duplicate this. I can't go to >>>> another site without testing some kind of fix. >>>> >>>>> Mark Miller wrote: >>>>>> Yeah, its pretty close to 2.3.2, but I think from last week mabye. >>>>>> >>>>>> I finally have one of the stack traces (this comes on the tail >>>>>> complete >>>>>> laptop failure so I am scrambling here) >>>>>> >>>>>> java.lang.IndexOutOfBoundsException: Index: 97, Size: 43 >>>>>> at java.util.ArrayList.RangeCheck(ArrayList.java:572) >>>>>> at java.util.ArrayList.get(ArrayList.java:347) >>>>>> at org.apache.lucene.index.FieldInfos.fieldInfo >>>>>> (FieldInfos.java:260) >>>>>> at org.apache.lucene.index.FieldsReader.doc >>>>>> (FieldsReader.java:184) >>>>>> at org.apache.lucene.index.SegmentReader.document >>>>>> (SegmentReader.java:670) >>>>>> at org.apache.lucene.index.MultiSegmentReader.document >>>>>> (MultiSegmentReader.java:257) >>>>>> at org.apache.lucene.search.IndexSearcher.doc >>>>>> (IndexSearcher.java:97) >>>>>> >>>>>> On Mon, 2008-05-05 at 14:48 -0500, crspan wrote: >>>>>>> coincidence or it is from 2.3.2 ? >>>>>>> >>>>>>> env: >>>>>>> lucene 2.3.2 >>>>>>> jdk1.6.0_06 & jdk1.5.0_15 >>>>>>> >>>>>>> >>>>>>> QueryString: >>>>>>> illeg^30.820824 technolog^22.290413 transfer^33.307804 >>>>>>> Error: java.lang.ArrayIndexOutOfBoundsException: >>>>>>> 132704java.lang.ArrayIndexOutOfBoundsException: 132704 >>>>>>> at >>>>>>> org.apache.lucene.search.BooleanScorer2$Coordinator.coordFactor >>>>>>> (BooleanScorer2.java:55) >>>>>>> at org.apache.lucene.search.BooleanScorer2.score >>>>>>> (BooleanScorer2.java:358) >>>>>>> at org.apache.lucene.search.BooleanScorer2.score >>>>>>> (BooleanScorer2.java:320) >>>>>>> at org.apache.lucene.search.IndexSearcher.search >>>>>>> (IndexSearcher.java:146) >>>>>>> at org.apache.lucene.search.IndexSearcher.search >>>>>>> (IndexSearcher.java:113) >>>>>>> at org.apache.lucene.search.Searcher.search(Searcher.java:132) >>>>>>> at >>>>>>> org.cr.search.TrecQueryRelevanceFeedback.main >>>>>>> (TrecQueryRelevanceFeedback.java:776) >>>>>>> >>>>>>> >>>>>>> QueryString: >>>>>>> oceanograph^68.48028 vessel^43.191563 >>>>>>> Error: >>>>>>> java.lang.ArrayIndexOutOfBoundsExceptionjava.lang.ArrayIndexOutOfBoun >>>>>>> dsException >>>>>>> at java.lang.System.arraycopy(Native Method) >>>>>>> at >>>>>>> org.apache.lucene.index.TermVectorsReader.readTermVector >>>>>>> (TermVectorsReader.java:353) >>>>>>> at >>>>>>> org.apache.lucene.index.TermVectorsReader.readTermVectors >>>>>>> (TermVectorsReader.java:287) >>>>>>> at org.apache.lucene.index.TermVectorsReader.get >>>>>>> (TermVectorsReader.java:232) >>>>>>> at >>>>>>> org.apache.lucene.index.SegmentReader.getTermFreqVectors >>>>>>> (SegmentReader.java:981) >>>>>>> at org.cr.rf.RelevanceFeedback.RelFeedbackWeight >>>>>>> (RelevanceFeedback.java:134) >>>>>>> at >>>>>>> org.cr.search.TrecQueryRelevanceFeedback.main >>>>>>> (TrecQueryRelevanceFeedback.java:781) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Mark Miller wrote: >>>>>>>> Any recent changes that would expose index corruption? >>>>>>>> >>>>>>>> I am getting two new errors when trying to search: >>>>>>>> >>>>>>>> nullpointer fieldsreaders line 260 >>>>>>>> >>>>>>>> indexoutofbounds on fieldinfo line 185 >>>>>>>> >>>>>>>> I am kind of screwed, because reindexing fixes this, but I cant >>>>>>>> reindex! >>>>>>>> >>>>>>>> Any ideas? >>>>>>>> >>>>>>>> >>>>>>>> -------------------------------------------------------------------- >>>>>>>> - >>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org