Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 92719 invoked from network); 7 Feb 2011 09:49:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Feb 2011 09:49:59 -0000 Received: (qmail 20963 invoked by uid 500); 7 Feb 2011 09:49:58 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 20487 invoked by uid 500); 7 Feb 2011 09:49:55 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 20480 invoked by uid 99); 7 Feb 2011 09:49:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Feb 2011 09:49:55 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Feb 2011 09:49:51 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id A2501197475 for ; Mon, 7 Feb 2011 09:49:30 +0000 (UTC) Date: Mon, 7 Feb 2011 09:49:30 +0000 (UTC) From: "Nick Pellow (JIRA)" To: dev@lucene.apache.org Message-ID: <1603505012.3518.1297072170661.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] Commented: (LUCENE-2666) ArrayIndexOutOfBoundsException when iterating over TermDocs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-2666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991315#comment-12991315 ] Nick Pellow commented on LUCENE-2666: ------------------------------------- Hi Michael, This issue was entirely a problem with our code, and I doubt Lucene could have done a better job. The problem was that on upgrade of the index (done when fields have changed etc), we recreate the index in the same location using {{IndexWriter.create(directory, analyzer, true, MAX_FIELD_LENGTH)}}. Some code was added just before this however, that deleted every single file in the directory. This meant that some other thread performing a search could have seen a corrupt index, thus causing the AIOOBE. The developer was paranoid that IndexWriter.create was leaving old files lying around. I'm glad we got to the bottom of this, and very much so that it was not a bug in Lucene! Thanks again for helping us track this down. Best Regards, Nick Pellow > ArrayIndexOutOfBoundsException when iterating over TermDocs > ----------------------------------------------------------- > > Key: LUCENE-2666 > URL: https://issues.apache.org/jira/browse/LUCENE-2666 > Project: Lucene - Java > Issue Type: Bug > Components: Index > Affects Versions: 3.0.2 > Reporter: Shay Banon > Attachments: checkindex-out.txt > > > A user got this very strange exception, and I managed to get the index that it happens on. Basically, iterating over the TermDocs causes an AAOIB exception. I easily reproduced it using the FieldCache which does exactly that (the field in question is indexed as numeric). Here is the exception: > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114 > at org.apache.lucene.util.BitVector.get(BitVector.java:104) > at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) > at org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501) > at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183) > at org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470) > at TestMe.main(TestMe.java:56) > It happens on the following segment: _26t docCount: 914 delCount: 1 delFileName: _26t_1.del > And as you can see, it smells like a corner case (it fails for document number 912, the AIOOB happens from the deleted docs). The code to recreate it is simple: > FSDirectory dir = FSDirectory.open(new File("index")); > IndexReader reader = IndexReader.open(dir, true); > IndexReader[] subReaders = reader.getSequentialSubReaders(); > for (IndexReader subReader : subReaders) { > Field field = subReader.getClass().getSuperclass().getDeclaredField("si"); > field.setAccessible(true); > SegmentInfo si = (SegmentInfo) field.get(subReader); > System.out.println("--> " + si); > if (si.getDocStoreSegment().contains("_26t")) { > // this is the probleatic one... > System.out.println("problematic one..."); > FieldCache.DEFAULT.getLongs(subReader, "__documentdate", FieldCache.NUMERIC_UTILS_LONG_PARSER); > } > } > Here is the result of a check index on that segment: > 8 of 10: name=_26t docCount=914 > compound=true > hasProx=true > numFiles=2 > size (MB)=1.641 > diagnostics = {optimize=false, mergeFactor=10, os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, os.arch=amd64, java.version=1.6.0, java.vendor=Sun Microsystems Inc.} > has deletions [delFileName=_26t_1.del] > test: open reader.........OK [1 deleted docs] > test: fields..............OK [32 fields] > test: field norms.........OK [32 fields] > test: terms, freq, prox...ERROR [114] > java.lang.ArrayIndexOutOfBoundsException: 114 > at org.apache.lucene.util.BitVector.get(BitVector.java:104) > at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127) > at org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102) > at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616) > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509) > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) > at TestMe.main(TestMe.java:47) > test: stored fields.......ERROR [114] > java.lang.ArrayIndexOutOfBoundsException: 114 > at org.apache.lucene.util.BitVector.get(BitVector.java:104) > at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) > at org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684) > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512) > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) > at TestMe.main(TestMe.java:47) > test: term vectors........ERROR [114] > java.lang.ArrayIndexOutOfBoundsException: 114 > at org.apache.lucene.util.BitVector.get(BitVector.java:104) > at org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34) > at org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721) > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:515) > at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299) > at TestMe.main(TestMe.java:47) > The creation of the index does not do something fancy (all defaults), though there is usage of the near real time aspect (IndexWriter#getReader) which does complicate deleted docs handling. Seems like the deleted docs got written without matching the number of docs?. Sadly, I don't have something that recreates it from scratch, but I do have the index if someone want to have a look at it (mail me directly and I will provide a download link). > I will continue to investigate why this might happen, just wondering if someone stumbled on this exception before. Lucene 3.0.2 is used. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org