lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
Date Thu, 29 Nov 2007 22:29:55 GMT
Just a theory (make that a guess), Mike, but is it possible that the  
one merge scheduler is hitting a synchronization issue with the  
deletedDocuments bit vector?  That is one thread is cleaning it up and  
the other is accessing and they aren't synchronizing their access?

This doesn't explain the original problem, but maybe this one?

On Nov 29, 2007, at 4:46 PM, Bill Janssen wrote:

>> Have you tried another PPC machine?
>
> No.  It's in another location, but perhaps I can get it tomorrow.  On
> the other hand, the success when using 2.0 makes it likely to me that
> the machine isn't the problem.
>
> OK, I've reverted to my original codebase (where I first create a
> reader and do the deletions, then create a writer and do the additions
> and optimize), and it works fine with lucene-core-2.0.0, but fails
> with lucene-core-2.3.-whatever (last night's build).  Here's the dump:
>
> indexing with /Library/Java/Home/bin/java  - 
> Dcom.parc.uplib.indexing.debugMode=true "- 
> Dcom.parc.uplib.indexing.indexProperties=contents:title:categories 
> $,*:date@:apparent-mime-type*:authors$\sand\s:comment:abstract:email- 
> message-id*:email-guid*:email-subject:email-from-name:email-from- 
> address*:email-attachment-to*:email-thread-index*:email-references 
> $,*:email-in-reply-to$,*:keywords$,*:album:performer:composer:music- 
> genre*:audio-length:accompaniment:paragraph-ids$,*:sha-hash*" - 
> classpath "/local/uplib/share/UpLib-1.7/code/lucene- 
> core-2.3-2007-11-29_02-49-31.jar:/local/uplib/share/UpLib-1.7/code/ 
> LuceneIndexing.jar" -Dorg.apache.lucene.writeLockTimeout=20000  
> com.parc.uplib.indexing.LuceneIndexing "/local/janssen-uplib/index"  
> update /local/janssen-uplib/docs 01160-06-3246-773 01159-97-2914-663  
> 01159-89-7507-719 01159-89-5614-073 01159-89-1159-244  
> 01159-89-0665-499
> thr001: acquiring lock:  LuceneIndex...
> thr001: acquired lock:  LuceneIndex*
> thr001: releasing lock:  LuceneIndex*
> thr001:   indexing output is <updating
> doc_root_dir is /local/janssen-uplib/docs
> index file is /local/janssen-uplib/index and it exists.
> Deleted 1 existing instances of 01160-06-3246-773
> Deleted 1 existing instances of 01159-97-2914-663
> Deleted 5 existing instances of 01159-89-7507-719
> Deleted 26 existing instances of 01159-89-5614-073
> Deleted 5 existing instances of 01159-89-1159-244
> Deleted 6 existing instances of 01159-89-0665-499
> IFD [main]: setInfoStream  
> deletionPolicy 
> =org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@9b42e6
> IW 0 [main]: setInfoStream: dir=org.apache.lucene.store.FSDirectory@/ 
> local/janssen-uplib/index autoCommit=true  
> mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@6c79d7  
> mergeScheduler 
> =org.apache.lucene.index.ConcurrentMergeScheduler@b33d0a  
> ramBufferSizeMB=16.0 maxBuffereDocs=-1 maxBuffereDeleteTerms=-1  
> maxFieldLength=10000 index=_4j:c19686
> IW 0 [main]: setMaxFieldLength 2147483647
> Working on document /local/janssen-uplib/docs/01160-06-3246-773
>  Adding header 'apparent-mime-type' I to 01160-06-3246-773
>  Adding header 'authors' IT to 01160-06-3246-773
>  Adding header 'categories' I (cartoon) to 01160-06-3246-773
>  Adding header 'date' I (19951005) to 01160-06-3246-773
>  Adding header 'sha-hash' I to 01160-06-3246-773
>  Created empty doc Document<stored/uncompressed,indexed<id: 
> 01160-06-3246-773> stored/uncompressed,indexed<uplibdate:20061005>  
> stored/uncompressed,indexed<uplibtype:whole>>
> Added 01160-06-3246-773 (1 versions)
> Working on document /local/janssen-uplib/docs/01159-97-2914-663
>  Adding header 'apparent-mime-type' I to 01159-97-2914-663
>  Adding header 'authors' IT to 01159-97-2914-663
>  Adding header 'categories' I (cartoon) to 01159-97-2914-663
>  Adding header 'date' I (19951004) to 01159-97-2914-663
>  Adding header 'sha-hash' I to 01159-97-2914-663
>  Created empty doc Document<stored/uncompressed,indexed<id: 
> 01159-97-2914-663> stored/uncompressed,indexed<uplibdate:20061004>  
> stored/uncompressed,indexed<uplibtype:whole>>
> Added 01159-97-2914-663 (1 versions)
> Working on document /local/janssen-uplib/docs/01159-89-7507-719
>  Adding header 'apparent-mime-type' I to 01159-89-7507-719
>  Adding header 'sha-hash' I to 01159-89-7507-719
>  Adding header 'title' IT (Photoshop Metal Texture) to  
> 01159-89-7507-719
>  Created empty doc Document<stored/uncompressed,indexed<id: 
> 01159-89-7507-719> stored/uncompressed,indexed<uplibdate:20061003>  
> stored/uncompressed,indexed<uplibtype:whole>>
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
>    page 0 (580):  Tutorials\xa5 News\xa5 Exclusives\xa5 S
>    page 1 (1680):  On a new layer create a gradie
>    page 2 (1118):  Scrapes and scratches are irre
>    page 3 (470):  Bevel Settings\xa5 Contour Settin
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
> Added 01159-89-7507-719 (5 versions)
> Working on document /local/janssen-uplib/docs/01159-89-5614-073
>  Adding header 'apparent-mime-type' I to 01159-89-5614-073
>  Adding header 'sha-hash' I to 01159-89-5614-073
>  Adding header 'title' IT (Creating Virtual Mats and Frames with The  
> GIMP) to 01159-89-5614-073
>  Created empty doc Document<stored/uncompressed,indexed<id: 
> 01159-89-5614-073> stored/uncompressed,indexed<uplibdate:20061003>  
> stored/uncompressed,indexed<uplibtype:whole>>
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
>    page 0 (663):  All photographs and articles o
>    page 1 (600):  Although real mats and frames
>    page 2 (999):  The Procedure First of all you
>    page 3 (615):  Run the Add Mat script (Script
>    page 4 (693):  in the GIMP toolbox Pattern: U
>    page 5 (703):  3D lighted/shaded appearance.
>    page 6 (719):  Bevel Fill Color, pops up a di
>    page 7 (714):  texture afterwards. Default: o
>    page 8 (797):  recommended, especially if you
>    page 9 (461):  moving outwards, as in adding
>    page 10 (67):  11 Creating Virtual Mats and F
>    page 11 (67):  12 Creating Virtual Mats and F
>    page 12 (378):  Time to add a frame. Run Scrip
>    page 13 (498):  in Frame Fill Color FG color:
>    page 14 (717):  and background colors, not in
>    page 15 (685):  the pattern to for texturing t
>    page 16 (721):  added along the inner boundary
>    page 17 (1006):  leave a selection in place cov
>    page 18 (904):  A drop shadow on the entire fr
>    page 19 (629):  threshold sliders to the right
>    page 20 (901):  Bump Map" and fill it with whi
>    page 21 (786):  image window, do a Select All
>    page 22 (393):  In the Layers dialog, choose t
>    page 23 (937):  "Keep Trans." option near the
>    page 24 (239):  Last modified: Mon May 9 23:36
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
> Added 01159-89-5614-073 (26 versions)
> Working on document /local/janssen-uplib/docs/01159-89-1159-244
>  Adding header 'apparent-mime-type' I to 01159-89-1159-244
>  Adding header 'authors' IT to 01159-89-1159-244
>  Adding header 'categories' I (ebooks) to 01159-89-1159-244
>  Adding header 'categories' I (article) to 01159-89-1159-244
>  Adding header 'date' I (20050100) to 01159-89-1159-244
>  Adding header 'sha-hash' I to 01159-89-1159-244
>  Adding header 'title' IT (The Future of Books) to 01159-89-1159-244
>  Created empty doc Document<stored/uncompressed,indexed<id: 
> 01159-89-1159-244> stored/uncompressed,indexed<uplibdate:20061003>  
> stored/uncompressed,indexed<uplibtype:whole>>
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
>    page 0 (3649):  Close Window The Future of Boo
>    page 1 (4291):  Ken agreed and suggested that
>    page 2 (3934):  Catalog online but instead auc
>    page 3 (2331):  At Marsh's workshop we watched
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
> Added 01159-89-1159-244 (5 versions)
> Working on document /local/janssen-uplib/docs/01159-89-0665-499
>  Adding header 'apparent-mime-type' I to 01159-89-0665-499
>  Adding header 'authors' IT to 01159-89-0665-499
>  Adding header 'categories' I (review) to 01159-89-0665-499
>  Adding header 'categories' I (article) to 01159-89-0665-499
>  Adding header 'categories' I (ebooks) to 01159-89-0665-499
>  Adding header 'date' I (20061019) to 01159-89-0665-499
>  Adding header 'sha-hash' I to 01159-89-0665-499
>  Adding header 'title' IT (Books@Google) to 01159-89-0665-499
>  Created empty doc Document<stored/uncompressed,indexed<id: 
> 01159-89-0665-499> stored/uncompressed,indexed<uplibdate:20061003>  
> stored/uncompressed,indexed<uplibtype:whole>>
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
>    page 0 (1595):  Home \xe1 Your account \xe1 Current
>    page 1 (4636):  sapienscompelled the invention
>    page 2 (4341):  to a base population of 40,000
>    page 3 (5113):  The privacy policy and the Chi
>    page 4 (792):  Notes [1] Wikipedia, unlike Go
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
> Added 01159-89-0665-499 (6 versions)
> Optimizing...
> IW 0 [main]: optimize: index now _4j:c19686
> IW 0 [main]:   flush: segment=_4k docStoreSegment=_4k  
> docStoreOffset=0 flushDocs=true flushDeletes=false  
> flushDocStores=true numDocs=44 numBufDelTerms=0
> IW 0 [main]:   index before flush _4j:c19686
>
> closeDocStore: 2 files to flush to segment _4k
>
> flush postings as segment _4k numDocs=44
>  oldRAMSize=141248 newFlushedSize=67003 docs/MB=688.586 new/ 
> old=47.436%
> IW 0 [main]: checkpoint: wrote segments file "segments_be"
> IFD [main]: now checkpoint "segments_be" [2 segments ; isCommit =  
> true]
> IFD [main]: deleteCommits: now remove commit "segments_bd"
> IFD [main]: delete "segments_bd"
> IW 0 [main]: checkpoint: wrote segments file "segments_bf"
> IFD [main]: now checkpoint "segments_bf" [2 segments ; isCommit =  
> true]
> IFD [main]: deleteCommits: now remove commit "segments_be"
> IFD [main]: delete "_4k.fnm"
> IFD [main]: delete "_4k.frq"
> IFD [main]: delete "_4k.prx"
> IFD [main]: delete "_4k.tis"
> IFD [main]: delete "_4k.tii"
> IFD [main]: delete "_4k.nrm"
> IFD [main]: delete "_4k.fdx"
> IFD [main]: delete "_4k.fdt"
> IFD [main]: delete "segments_be"
> IW 0 [main]: LMP: findMerges: 2 segments
> IW 0 [main]: LMP:   level 6.744767 to 7.494767: 1 segments
> IW 0 [main]: LMP:   level -1.0 to 4.842865: 1 segments
> IW 0 [main]: CMS: now merge
> IW 0 [main]: CMS:   index: _4j:c19686 _4k:c44
> IW 0 [main]: CMS:   no more merges pending; now return
> IW 0 [main]: add merge to pendingMerges: _4j:c19686 _4k:c44  
> [optimize] [total 1 pending]
> IW 0 [main]: CMS: now merge
> IW 0 [main]: CMS:   index: _4j:c19686 _4k:c44
> IW 0 [main]: CMS:   consider merge _4j:c19686 _4k:c44 into _4l  
> [optimize]
> IW 0 [main]: CMS:     launch new thread [Thread-0]
> IW 0 [main]: CMS:   no more merges pending; now return
> IW 0 [Thread-0]: CMS:   merge thread: start
> IW 0 [Thread-0]: now merge
>  merge=_4j:c19686 _4k:c44 into _4l [optimize]
>  index=_4j:c19686 _4k:c44
> IW 0 [Thread-0]: merging _4j:c19686 _4k:c44 into _4l [optimize]
> IW 0 [Thread-0]: merge: total 19686 docs
> IW 0 [Thread-0]: hit exception during merge; now refresh deleter on  
> segment _4l
> IFD [Thread-0]: refresh [prefix=_4l]: removing newly created  
> unreferenced file "_4l.fdt"
> IFD [Thread-0]: delete "_4l.fdt"
> IFD [Thread-0]: refresh [prefix=_4l]: removing newly created  
> unreferenced file "_4l.fdx"
> IFD [Thread-0]: delete "_4l.fdx"
> IFD [Thread-0]: refresh [prefix=_4l]: removing newly created  
> unreferenced file "_4l.fnm"
> IFD [Thread-0]: delete "_4l.fnm"
> IFD [Thread-0]: refresh [prefix=_4l]: removing newly created  
> unreferenced file "_4l.frq"
> IFD [Thread-0]: delete "_4l.frq"
> IFD [Thread-0]: refresh [prefix=_4l]: removing newly created  
> unreferenced file "_4l.prx"
> IFD [Thread-0]: delete "_4l.prx"
> IFD [Thread-0]: refresh [prefix=_4l]: removing newly created  
> unreferenced file "_4l.tii"
> IFD [Thread-0]: delete "_4l.tii"
> IFD [Thread-0]: refresh [prefix=_4l]: removing newly created  
> unreferenced file "_4l.tis"
> IFD [Thread-0]: delete "_4l.tis"
> IW 0 [Thread-0]: hit exception during merge
> java.io.IOException: background merge hit exception: _4j:c19686  
> _4k:c44 into _4l [optimize]
> 	at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java: 
> 1705)
> 	at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java: 
> 1654)
> 	at  
> com.parc.uplib.indexing.LuceneIndexing.update(LuceneIndexing.java:414)
> 	at com.parc.uplib.indexing.LuceneIndexing.main(LuceneIndexing.java: 
> 659)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out  
> of range: 21352
> 	at org.apache.lucene.util.BitVector.get(BitVector.java:72)
> 	at  
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:118)
> 	at  
> org 
> .apache 
> .lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:95)
> 	at  
> org 
> .apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java: 
> 467)
> 	at  
> org 
> .apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java: 
> 430)
> 	at  
> org 
> .apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java: 
> 402)
> 	at  
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java: 
> 366)
> 	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java: 
> 123)
> 	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 
> 3002)
> 	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2751)
> 	at org.apache.lucene.index.ConcurrentMergeScheduler 
> $MergeThread.run(ConcurrentMergeScheduler.java:240)
> Exception in thread "Thread-0" org.apache.lucene.index.MergePolicy 
> $MergeException: java.lang.ArrayIndexOutOfBoundsException: Array  
> index out of range: 21352
> 	at org.apache.lucene.index.ConcurrentMergeScheduler 
> $MergeThread.run(ConcurrentMergeScheduler.java:274)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out  
> of range: 21352
> 	at org.apache.lucene.util.BitVector.get(BitVector.java:72)
> 	at  
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:118)
> 	at  
> org 
> .apache 
> .lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:95)
> 	at  
> org 
> .apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java: 
> 467)
> 	at  
> org 
> .apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java: 
> 430)
> 	at  
> org 
> .apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java: 
> 402)
> 	at  
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java: 
> 366)
> 	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java: 
> 123)
> 	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 
> 3002)
> 	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2751)
> 	at org.apache.lucene.index.ConcurrentMergeScheduler 
> $MergeThread.run(ConcurrentMergeScheduler.java:240)
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message