lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Janssen <jans...@parc.com>
Subject Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
Date Thu, 29 Nov 2007 13:46:30 GMT
> Have you tried another PPC machine?

No.  It's in another location, but perhaps I can get it tomorrow.  On
the other hand, the success when using 2.0 makes it likely to me that
the machine isn't the problem.

OK, I've reverted to my original codebase (where I first create a
reader and do the deletions, then create a writer and do the additions
and optimize), and it works fine with lucene-core-2.0.0, but fails
with lucene-core-2.3.-whatever (last night's build).  Here's the dump:

indexing with /Library/Java/Home/bin/java  -Dcom.parc.uplib.indexing.debugMode=true "-Dcom.parc.uplib.indexing.indexProperties=contents:title:categories$,*:date@:apparent-mime-type*:authors$\sand\s:comment:abstract:email-message-id*:email-guid*:email-subject:email-from-name:email-from-address*:email-attachment-to*:email-thread-index*:email-references$,*:email-in-reply-to$,*:keywords$,*:album:performer:composer:music-genre*:audio-length:accompaniment:paragraph-ids$,*:sha-hash*"
-classpath "/local/uplib/share/UpLib-1.7/code/lucene-core-2.3-2007-11-29_02-49-31.jar:/local/uplib/share/UpLib-1.7/code/LuceneIndexing.jar"
-Dorg.apache.lucene.writeLockTimeout=20000 com.parc.uplib.indexing.LuceneIndexing "/local/janssen-uplib/index"
update /local/janssen-uplib/docs 01160-06-3246-773 01159-97-2914-663 01159-89-7507-719 01159-89-5614-073
01159-89-1159-244 01159-89-0665-499
thr001: acquiring lock:  LuceneIndex...
thr001: acquired lock:  LuceneIndex*
thr001: releasing lock:  LuceneIndex*
thr001:   indexing output is <updating
doc_root_dir is /local/janssen-uplib/docs
index file is /local/janssen-uplib/index and it exists.
Deleted 1 existing instances of 01160-06-3246-773
Deleted 1 existing instances of 01159-97-2914-663
Deleted 5 existing instances of 01159-89-7507-719
Deleted 26 existing instances of 01159-89-5614-073
Deleted 5 existing instances of 01159-89-1159-244
Deleted 6 existing instances of 01159-89-0665-499
IFD [main]: setInfoStream deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@9b42e6
IW 0 [main]: setInfoStream: dir=org.apache.lucene.store.FSDirectory@/local/janssen-uplib/index
autoCommit=true mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@6c79d7 mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@b33d0a
ramBufferSizeMB=16.0 maxBuffereDocs=-1 maxBuffereDeleteTerms=-1 maxFieldLength=10000 index=_4j:c19686
IW 0 [main]: setMaxFieldLength 2147483647
Working on document /local/janssen-uplib/docs/01160-06-3246-773
  Adding header 'apparent-mime-type' I to 01160-06-3246-773
  Adding header 'authors' IT to 01160-06-3246-773
  Adding header 'categories' I (cartoon) to 01160-06-3246-773
  Adding header 'date' I (19951005) to 01160-06-3246-773
  Adding header 'sha-hash' I to 01160-06-3246-773
  Created empty doc Document<stored/uncompressed,indexed<id:01160-06-3246-773> stored/uncompressed,indexed<uplibdate:20061005>
stored/uncompressed,indexed<uplibtype:whole>>
Added 01160-06-3246-773 (1 versions)
Working on document /local/janssen-uplib/docs/01159-97-2914-663
  Adding header 'apparent-mime-type' I to 01159-97-2914-663
  Adding header 'authors' IT to 01159-97-2914-663
  Adding header 'categories' I (cartoon) to 01159-97-2914-663
  Adding header 'date' I (19951004) to 01159-97-2914-663
  Adding header 'sha-hash' I to 01159-97-2914-663
  Created empty doc Document<stored/uncompressed,indexed<id:01159-97-2914-663> stored/uncompressed,indexed<uplibdate:20061004>
stored/uncompressed,indexed<uplibtype:whole>>
Added 01159-97-2914-663 (1 versions)
Working on document /local/janssen-uplib/docs/01159-89-7507-719
  Adding header 'apparent-mime-type' I to 01159-89-7507-719
  Adding header 'sha-hash' I to 01159-89-7507-719
  Adding header 'title' IT (Photoshop Metal Texture) to 01159-89-7507-719
  Created empty doc Document<stored/uncompressed,indexed<id:01159-89-7507-719> stored/uncompressed,indexed<uplibdate:20061003>
stored/uncompressed,indexed<uplibtype:whole>>
  Using charset utf8 for contents.txt
  Using language en for contents.txt
    page 0 (580):  Tutorials\xa5 News\xa5 Exclusives\xa5 S
    page 1 (1680):  On a new layer create a gradie
    page 2 (1118):  Scrapes and scratches are irre
    page 3 (470):  Bevel Settings\xa5 Contour Settin
  Using charset utf8 for contents.txt
  Using language en for contents.txt
Added 01159-89-7507-719 (5 versions)
Working on document /local/janssen-uplib/docs/01159-89-5614-073
  Adding header 'apparent-mime-type' I to 01159-89-5614-073
  Adding header 'sha-hash' I to 01159-89-5614-073
  Adding header 'title' IT (Creating Virtual Mats and Frames with The GIMP) to 01159-89-5614-073
  Created empty doc Document<stored/uncompressed,indexed<id:01159-89-5614-073> stored/uncompressed,indexed<uplibdate:20061003>
stored/uncompressed,indexed<uplibtype:whole>>
  Using charset utf8 for contents.txt
  Using language en for contents.txt
    page 0 (663):  All photographs and articles o
    page 1 (600):  Although real mats and frames 
    page 2 (999):  The Procedure First of all you
    page 3 (615):  Run the Add Mat script (Script
    page 4 (693):  in the GIMP toolbox Pattern: U
    page 5 (703):  3D lighted/shaded appearance. 
    page 6 (719):  Bevel Fill Color, pops up a di
    page 7 (714):  texture afterwards. Default: o
    page 8 (797):  recommended, especially if you
    page 9 (461):  moving outwards, as in adding 
    page 10 (67):  11 Creating Virtual Mats and F
    page 11 (67):  12 Creating Virtual Mats and F
    page 12 (378):  Time to add a frame. Run Scrip
    page 13 (498):  in Frame Fill Color FG color: 
    page 14 (717):  and background colors, not in 
    page 15 (685):  the pattern to for texturing t
    page 16 (721):  added along the inner boundary
    page 17 (1006):  leave a selection in place cov
    page 18 (904):  A drop shadow on the entire fr
    page 19 (629):  threshold sliders to the right
    page 20 (901):  Bump Map" and fill it with whi
    page 21 (786):  image window, do a Select All 
    page 22 (393):  In the Layers dialog, choose t
    page 23 (937):  "Keep Trans." option near the 
    page 24 (239):  Last modified: Mon May 9 23:36
  Using charset utf8 for contents.txt
  Using language en for contents.txt
Added 01159-89-5614-073 (26 versions)
Working on document /local/janssen-uplib/docs/01159-89-1159-244
  Adding header 'apparent-mime-type' I to 01159-89-1159-244
  Adding header 'authors' IT to 01159-89-1159-244
  Adding header 'categories' I (ebooks) to 01159-89-1159-244
  Adding header 'categories' I (article) to 01159-89-1159-244
  Adding header 'date' I (20050100) to 01159-89-1159-244
  Adding header 'sha-hash' I to 01159-89-1159-244
  Adding header 'title' IT (The Future of Books) to 01159-89-1159-244
  Created empty doc Document<stored/uncompressed,indexed<id:01159-89-1159-244> stored/uncompressed,indexed<uplibdate:20061003>
stored/uncompressed,indexed<uplibtype:whole>>
  Using charset utf8 for contents.txt
  Using language en for contents.txt
    page 0 (3649):  Close Window The Future of Boo
    page 1 (4291):  Ken agreed and suggested that 
    page 2 (3934):  Catalog online but instead auc
    page 3 (2331):  At Marsh's workshop we watched
  Using charset utf8 for contents.txt
  Using language en for contents.txt
Added 01159-89-1159-244 (5 versions)
Working on document /local/janssen-uplib/docs/01159-89-0665-499
  Adding header 'apparent-mime-type' I to 01159-89-0665-499
  Adding header 'authors' IT to 01159-89-0665-499
  Adding header 'categories' I (review) to 01159-89-0665-499
  Adding header 'categories' I (article) to 01159-89-0665-499
  Adding header 'categories' I (ebooks) to 01159-89-0665-499
  Adding header 'date' I (20061019) to 01159-89-0665-499
  Adding header 'sha-hash' I to 01159-89-0665-499
  Adding header 'title' IT (Books@Google) to 01159-89-0665-499
  Created empty doc Document<stored/uncompressed,indexed<id:01159-89-0665-499> stored/uncompressed,indexed<uplibdate:20061003>
stored/uncompressed,indexed<uplibtype:whole>>
  Using charset utf8 for contents.txt
  Using language en for contents.txt
    page 0 (1595):  Home \xe1 Your account \xe1 Current 
    page 1 (4636):  sapienscompelled the invention
    page 2 (4341):  to a base population of 40,000
    page 3 (5113):  The privacy policy and the Chi
    page 4 (792):  Notes [1] Wikipedia, unlike Go
  Using charset utf8 for contents.txt
  Using language en for contents.txt
Added 01159-89-0665-499 (6 versions)
Optimizing...
IW 0 [main]: optimize: index now _4j:c19686
IW 0 [main]:   flush: segment=_4k docStoreSegment=_4k docStoreOffset=0 flushDocs=true flushDeletes=false
flushDocStores=true numDocs=44 numBufDelTerms=0
IW 0 [main]:   index before flush _4j:c19686

closeDocStore: 2 files to flush to segment _4k

flush postings as segment _4k numDocs=44
  oldRAMSize=141248 newFlushedSize=67003 docs/MB=688.586 new/old=47.436%
IW 0 [main]: checkpoint: wrote segments file "segments_be"
IFD [main]: now checkpoint "segments_be" [2 segments ; isCommit = true]
IFD [main]: deleteCommits: now remove commit "segments_bd"
IFD [main]: delete "segments_bd"
IW 0 [main]: checkpoint: wrote segments file "segments_bf"
IFD [main]: now checkpoint "segments_bf" [2 segments ; isCommit = true]
IFD [main]: deleteCommits: now remove commit "segments_be"
IFD [main]: delete "_4k.fnm"
IFD [main]: delete "_4k.frq"
IFD [main]: delete "_4k.prx"
IFD [main]: delete "_4k.tis"
IFD [main]: delete "_4k.tii"
IFD [main]: delete "_4k.nrm"
IFD [main]: delete "_4k.fdx"
IFD [main]: delete "_4k.fdt"
IFD [main]: delete "segments_be"
IW 0 [main]: LMP: findMerges: 2 segments
IW 0 [main]: LMP:   level 6.744767 to 7.494767: 1 segments
IW 0 [main]: LMP:   level -1.0 to 4.842865: 1 segments
IW 0 [main]: CMS: now merge
IW 0 [main]: CMS:   index: _4j:c19686 _4k:c44
IW 0 [main]: CMS:   no more merges pending; now return
IW 0 [main]: add merge to pendingMerges: _4j:c19686 _4k:c44 [optimize] [total 1 pending]
IW 0 [main]: CMS: now merge
IW 0 [main]: CMS:   index: _4j:c19686 _4k:c44
IW 0 [main]: CMS:   consider merge _4j:c19686 _4k:c44 into _4l [optimize]
IW 0 [main]: CMS:     launch new thread [Thread-0]
IW 0 [main]: CMS:   no more merges pending; now return
IW 0 [Thread-0]: CMS:   merge thread: start
IW 0 [Thread-0]: now merge
  merge=_4j:c19686 _4k:c44 into _4l [optimize]
  index=_4j:c19686 _4k:c44
IW 0 [Thread-0]: merging _4j:c19686 _4k:c44 into _4l [optimize]
IW 0 [Thread-0]: merge: total 19686 docs
IW 0 [Thread-0]: hit exception during merge; now refresh deleter on segment _4l
IFD [Thread-0]: refresh [prefix=_4l]: removing newly created unreferenced file "_4l.fdt"
IFD [Thread-0]: delete "_4l.fdt"
IFD [Thread-0]: refresh [prefix=_4l]: removing newly created unreferenced file "_4l.fdx"
IFD [Thread-0]: delete "_4l.fdx"
IFD [Thread-0]: refresh [prefix=_4l]: removing newly created unreferenced file "_4l.fnm"
IFD [Thread-0]: delete "_4l.fnm"
IFD [Thread-0]: refresh [prefix=_4l]: removing newly created unreferenced file "_4l.frq"
IFD [Thread-0]: delete "_4l.frq"
IFD [Thread-0]: refresh [prefix=_4l]: removing newly created unreferenced file "_4l.prx"
IFD [Thread-0]: delete "_4l.prx"
IFD [Thread-0]: refresh [prefix=_4l]: removing newly created unreferenced file "_4l.tii"
IFD [Thread-0]: delete "_4l.tii"
IFD [Thread-0]: refresh [prefix=_4l]: removing newly created unreferenced file "_4l.tis"
IFD [Thread-0]: delete "_4l.tis"
IW 0 [Thread-0]: hit exception during merge
java.io.IOException: background merge hit exception: _4j:c19686 _4k:c44 into _4l [optimize]
	at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1705)
	at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1654)
	at com.parc.uplib.indexing.LuceneIndexing.update(LuceneIndexing.java:414)
	at com.parc.uplib.indexing.LuceneIndexing.main(LuceneIndexing.java:659)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 21352
	at org.apache.lucene.util.BitVector.get(BitVector.java:72)
	at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:118)
	at org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:95)
	at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:467)
	at org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:430)
	at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:402)
	at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:366)
	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:123)
	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3002)
	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2751)
	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:240)
Exception in thread "Thread-0" org.apache.lucene.index.MergePolicy$MergeException: java.lang.ArrayIndexOutOfBoundsException:
Array index out of range: 21352
	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:274)
Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 21352
	at org.apache.lucene.util.BitVector.get(BitVector.java:72)
	at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:118)
	at org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:95)
	at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:467)
	at org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:430)
	at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:402)
	at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:366)
	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:123)
	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3002)
	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2751)
	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:240)
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message