Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 73450 invoked from network); 29 Nov 2007 22:22:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Nov 2007 22:22:13 -0000 Received: (qmail 55316 invoked by uid 500); 29 Nov 2007 22:21:54 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 55281 invoked by uid 500); 29 Nov 2007 22:21:53 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 55270 invoked by uid 99); 29 Nov 2007 22:21:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Nov 2007 14:21:53 -0800 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [13.1.64.93] (HELO alpha.xerox.com) (13.1.64.93) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Nov 2007 22:21:33 +0000 Received: from synergy1.parc.xerox.com ([13.1.101.60]) by alpha.xerox.com with SMTP id <143099(2)>; Thu, 29 Nov 2007 14:16:08 PST Received: from parc.com ([127.0.0.1]) by synergy1.parc.xerox.com with SMTP id <58696>; Thu, 29 Nov 2007 14:16:03 PST To: java-user@lucene.apache.org Subject: Re: lucene-core-2.2.0.jar broken? CorruptIndexException? In-reply-to: <507AA1FE-18DB-4DC6-AF68-C31171FC9AE8@apache.org> References: <07Nov28.094136pst."58696"@synergy1.parc.xerox.com> <07Nov28.102301pst."58696"@synergy1.parc.xerox.com> <07Nov28.112847pst."58696"@synergy1.parc.xerox.com> <1196280278.23481.1223752059@webmail.messagingengine.com> <07Nov28.130137pst."58696"@synergy1.parc.xerox.com> <1196284625.4114.1223763677@webmail.messagingengine.com> <07Nov28.210649pst."58696"@synergy1.parc.xerox.com> <1196331025.25608.1223860421@webmail.messagingengine.com> <07Nov29.103456pst."58696"@synergy1.parc.xerox.com> <262FD71B-41F1-4AB6-A0E0-38A785F8CA95@apache.org> <07Nov29.112701pst."58696"@synergy1.parc.xerox.com> <507AA1FE-18DB-4DC6-AF68-C31171FC9AE8@apache.org> Comments: In-reply-to Grant Ingersoll message dated "Thu, 29 Nov 2007 12:53:50 -0800." Date: Thu, 29 Nov 2007 14:15:54 PST From: Bill Janssen Message-Id: <07Nov29.141603pst."58696"@synergy1.parc.xerox.com> X-Virus-Checked: Checked by ClamAV on apache.org So, it's a little clearer. I get the Array-out-of-bounds exception if I'm re-indexing some already indexed documents -- if there are deletions involved. I get the CorruptIndexException if I'm indexing freshly -- no deletions. Here's an example of that (with the latest nightly). I removed the existing index, then reindexed the collection six UpLib docs at a time, till I hit the corruption. Bill /Library/Java/Home/bin/java -Dcom.parc.uplib.indexing.debugMode=true "-Dcom.parc.uplib.indexing.indexProperties=contents:title:categories$,*:date@:apparent-mime-type*:authors$\sand\s:comment:abstract:email-message-id*:email-guid*:email-subject:email-from-name:email-from-address*:email-attachment-to*:email-thread-index*:email-references$,*:email-in-reply-to$,*:keywords$,*:album:performer:composer:music-genre*:audio-length:accompaniment:paragraph-ids$,*:sha-hash*" -classpath "/local/uplib/share/UpLib-1.7/code/lucene-core-2.3-2007-11-29_02-49-31.jar:/local/uplib/share/UpLib-1.7/code/LuceneIndexing.jar" -Dorg.apache.lucene.writeLockTimeout=20000 com.parc.uplib.indexing.LuceneIndexing "/local/janssen-uplib/index" update /local/janssen-uplib/docs 01113-86-6099-767 01113-86-5485-936 01113-86-0975-795 01113-62-2881-882 01113-44-7730-580 01113-44-7684-477 thr002: acquiring lock: LuceneIndex... thr002: acquired lock: LuceneIndex* thr002: releasing lock: LuceneIndex* thr002: indexing output is stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (4219): Question: My chives have grown Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-86-6099-767 (2 versions) Working on document /local/janssen-uplib/docs/01113-86-5485-936 Adding header 'abstract' IT to 01113-86-5485-936 Adding header 'apparent-mime-type' I to 01113-86-5485-936 Adding header 'authors' IT to 01113-86-5485-936 Adding header 'categories' I (paper) to 01113-86-5485-936 Adding header 'categories' I (sensepad) to 01113-86-5485-936 Adding header 'citation' I to 01113-86-5485-936 Adding header 'date' I (20040524) to 01113-86-5485-936 Adding header 'sha-hash' I to 01113-86-5485-936 Adding header 'title' IT (Designing Interaction, not Interfaces) to 01113-86-5485-936 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (3855): Designing Interaction, not Int page 1 (5688): Figure 1. Interaction as a phe page 2 (5831): Interaction models can be eval page 3 (5770): Reification turns concepts and page 4 (5558): Figure 6. A mock-up of the DPI page 5 (5963): In joint work with Yves Guiard page 6 (6819): I propose making interactions page 7 (5622): Graphical Application. Proc. A Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-86-5485-936 (9 versions) Working on document /local/janssen-uplib/docs/01113-86-0975-795 Adding header 'apparent-mime-type' I to 01113-86-0975-795 Adding header 'categories' I (article) to 01113-86-0975-795 Adding header 'date' I (20050414) to 01113-86-0975-795 Adding header 'sha-hash' I to 01113-86-0975-795 Adding header 'source' IT to 01113-86-0975-795 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (1851): About sponsorship Simplifying page 1 (2900): Latvia and Lithuania, Estonia' page 2 (3088): How much fairness is gained fo page 3 (5317): At the time of its reform, Est page 4 (1101): In part, the tax system is bur Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-86-0975-795 (6 versions) Working on document /local/janssen-uplib/docs/01113-62-2881-882 Adding header 'apparent-mime-type' I to 01113-62-2881-882 Adding header 'categories' I (article) to 01113-62-2881-882 Adding header 'date' I (20050328) to 01113-62-2881-882 Adding header 'keywords' I (neuroeconomics) to 01113-62-2881-882 Adding header 'sha-hash' I to 01113-62-2881-882 Adding header 'title' IT (Neuroeconomics: Why Logic Often Takes a Backseat) to 01113-62-2881-882 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (2957): Close Window MARCH 28, 2005 EC page 1 (3856): these attacks on rationality ? page 2 (484): Even believers in neuroeconomi Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-62-2881-882 (4 versions) Working on document /local/janssen-uplib/docs/01113-44-7730-580 Adding header 'apparent-mime-type' I to 01113-44-7730-580 Adding header 'categories' I (flowport) to 01113-44-7730-580 Adding header 'categories' I (receipt) to 01113-44-7730-580 Adding header 'categories' I (flowport) to 01113-44-7730-580 Adding header 'categories' I (flowport) to 01113-44-7730-580 Adding header 'comment' IT to 01113-44-7730-580 Adding header 'sha-hash' I to 01113-44-7730-580 Adding header 'title' IT (fax receipt for JCDL 2005 demo submission document) to 01113-44-7730-580 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (3383): Transmission fleport Dare T lo page 1 (2697): ACM Permission and Release For Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-44-7730-580 (3 versions) Working on document /local/janssen-uplib/docs/01113-44-7684-477 Adding header 'apparent-mime-type' I to 01113-44-7684-477 Adding header 'authors' IT to 01113-44-7684-477 Adding header 'categories' I (flowport) to 01113-44-7684-477 Adding header 'categories' I (article) to 01113-44-7684-477 Adding header 'categories' I (ebook) to 01113-44-7684-477 Adding header 'categories' I (flowport) to 01113-44-7684-477 Adding header 'categories' I (flowport) to 01113-44-7684-477 Adding header 'citation' I to 01113-44-7684-477 Adding header 'date' I (20050117) to 01113-44-7684-477 Adding header 'sha-hash' I to 01113-44-7684-477 Adding header 'title' IT (Signs of Life for E-books in 2004) to 01113-44-7684-477 Created empty doc Document stored/uncompressed,indexed stored/uncompressed,indexed> Using charset utf8 for contents.txt Using language en for contents.txt page 0 (6514): Signs ofLife for E-books in 20 Using charset utf8 for contents.txt Using language en for contents.txt Added 01113-44-7684-477 (2 versions) Optimizing... IW 0 [main]: optimize: index now _94:c8686 IW 0 [main]: flush: segment=_95 docStoreSegment=_95 docStoreOffset=0 flushDocs=true flushDeletes=false flushDocStores=true numDocs=26 numBufDelTerms=0 IW 0 [main]: index before flush _94:c8686 closeDocStore: 2 files to flush to segment _95 flush postings as segment _95 numDocs=26 oldRAMSize=269104 newFlushedSize=114988 docs/MB=237.094 new/old=42.73% IW 0 [main]: checkpoint: wrote segments file "segments_ic" IFD [main]: now checkpoint "segments_ic" [2 segments ; isCommit = true] IFD [main]: deleteCommits: now remove commit "segments_ib" IFD [main]: delete "segments_ib" IW 0 [main]: checkpoint: wrote segments file "segments_id" IFD [main]: now checkpoint "segments_id" [2 segments ; isCommit = true] IFD [main]: deleteCommits: now remove commit "segments_ic" IFD [main]: delete "_95.fnm" IFD [main]: delete "_95.frq" IFD [main]: delete "_95.prx" IFD [main]: delete "_95.tis" IFD [main]: delete "_95.tii" IFD [main]: delete "_95.nrm" IFD [main]: delete "_95.fdx" IFD [main]: delete "_95.fdt" IFD [main]: delete "segments_ic" IW 0 [main]: LMP: findMerges: 2 segments IW 0 [main]: LMP: level 6.4501038 to 7.2001038: 1 segments IW 0 [main]: LMP: level -1.0 to 5.070234: 1 segments IW 0 [main]: CMS: now merge IW 0 [main]: CMS: index: _94:c8686 _95:c26 IW 0 [main]: CMS: no more merges pending; now return IW 0 [main]: add merge to pendingMerges: _94:c8686 _95:c26 [optimize] [total 1 pending] IW 0 [main]: CMS: now merge IW 0 [main]: CMS: index: _94:c8686 _95:c26 IW 0 [main]: CMS: consider merge _94:c8686 _95:c26 into _96 [optimize] IW 0 [main]: CMS: launch new thread [Thread-0] IW 0 [Thread-0]: CMS: merge thread: start IW 0 [main]: CMS: no more merges pending; now return IW 0 [Thread-0]: now merge merge=_94:c8686 _95:c26 into _96 [optimize] index=_94:c8686 _95:c26 IW 0 [Thread-0]: merging _94:c8686 _95:c26 into _96 [optimize] IW 0 [Thread-0]: merge: total 8712 docs IW 0 [Thread-0]: hit exception during merge; now refresh deleter on segment _96 IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.fdt" IFD [Thread-0]: delete "_96.fdt" IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.fdx" IFD [Thread-0]: delete "_96.fdx" IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.fnm" IFD [Thread-0]: delete "_96.fnm" IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.frq" IFD [Thread-0]: delete "_96.frq" IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.prx" IFD [Thread-0]: delete "_96.prx" IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.tii" IFD [Thread-0]: delete "_96.tii" IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.tis" IFD [Thread-0]: delete "_96.tis" IW 0 [Thread-0]: hit exception during merge java.io.IOException: background merge hit exception: _94:c8686 _95:c26 into _96 [optimize] at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1705) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1654) at com.parc.uplib.indexing.LuceneIndexing.update(LuceneIndexing.java:414) at com.parc.uplib.indexing.LuceneIndexing.main(LuceneIndexing.java:659) Caused by: org.apache.lucene.index.CorruptIndexException: docs out of order (8692 <= 10221 ) at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:474) at org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:430) at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:402) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:366) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:123) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3002) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2751) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:240) Exception in thread "Thread-0" org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException: docs out of order (8692 <= 10221 ) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:274) Caused by: org.apache.lucene.index.CorruptIndexException: docs out of order (8692 <= 10221 ) at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:474) at org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:430) at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:402) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:366) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:123) at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3002) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2751) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:240) > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org