lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Janssen <jans...@parc.com>
Subject Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
Date Thu, 29 Nov 2007 14:15:54 GMT
So, it's a little clearer.  I get the Array-out-of-bounds exception if
I'm re-indexing some already indexed documents -- if there are
deletions involved.  I get the CorruptIndexException if I'm indexing
freshly -- no deletions.  Here's an example of that (with the latest
nightly).  I removed the existing index, then reindexed the collection
six UpLib docs at a time, till I hit the corruption.

Bill

/Library/Java/Home/bin/java  -Dcom.parc.uplib.indexing.debugMode=true "-Dcom.parc.uplib.indexing.indexProperties=contents:title:categories$,*:date@:apparent-mime-type*:authors$\sand\s:comment:abstract:email-message-id*:email-guid*:email-subject:email-from-name:email-from-address*:email-attachment-to*:email-thread-index*:email-references$,*:email-in-reply-to$,*:keywords$,*:album:performer:composer:music-genre*:audio-length:accompaniment:paragraph-ids$,*:sha-hash*"
-classpath "/local/uplib/share/UpLib-1.7/code/lucene-core-2.3-2007-11-29_02-49-31.jar:/local/uplib/share/UpLib-1.7/code/LuceneIndexing.jar"
-Dorg.apache.lucene.writeLockTimeout=20000 com.parc.uplib.indexing.LuceneIndexing "/local/janssen-uplib/index"
update /local/janssen-uplib/docs 01113-86-6099-767 01113-86-5485-936 01113-86-0975-795 01113-62-2881-882
01113-44-7730-580 01113-44-7684-477
thr002: acquiring lock:  LuceneIndex...
thr002: acquired lock:  LuceneIndex*
thr002: releasing lock:  LuceneIndex*
thr002:   indexing output is <updating
doc_root_dir is /local/janssen-uplib/docs
index file is /local/janssen-uplib/index and it exists.
Deleted 0 existing instances of 01113-86-6099-767
Deleted 0 existing instances of 01113-86-5485-936
Deleted 0 existing instances of 01113-86-0975-795
Deleted 0 existing instances of 01113-62-2881-882
Deleted 0 existing instances of 01113-44-7730-580
Deleted 0 existing instances of 01113-44-7684-477
IFD [main]: setInfoStream deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@4865ce
IW 0 [main]: setInfoStream: dir=org.apache.lucene.store.FSDirectory@/local/janssen-uplib/index
autoCommit=true mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@4ae2c1 mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler@6d2702
ramBufferSizeMB=16.0 maxBuffereDocs=-1 maxBuffereDeleteTerms=-1 maxFieldLength=10000 index=_94:c8686
IW 0 [main]: setMaxFieldLength 2147483647
Working on document /local/janssen-uplib/docs/01113-86-6099-767
  Adding header 'abstract' IT to 01113-86-6099-767
  Adding header 'apparent-mime-type' I to 01113-86-6099-767
  Adding header 'authors' IT to 01113-86-6099-767
  Adding header 'categories' I (flowport) to 01113-86-6099-767
  Adding header 'categories' I (article) to 01113-86-6099-767
  Adding header 'categories' I (flowport) to 01113-86-6099-767
  Adding header 'categories' I (flowport) to 01113-86-6099-767
  Adding header 'citation' I to 01113-86-6099-767
  Adding header 'date' I (20030800) to 01113-86-6099-767
  Adding header 'sha-hash' I to 01113-86-6099-767
  Adding header 'title' IT (The Great Chive Division) to 01113-86-6099-767
  Created empty doc Document<stored/uncompressed,indexed<id:01113-86-6099-767> stored/uncompressed,indexed<uplibdate:20050418>
stored/uncompressed,indexed<uplibtype:whole>>
  Using charset utf8 for contents.txt
  Using language en for contents.txt
    page 0 (4219):  Question: My chives have grown
  Using charset utf8 for contents.txt
  Using language en for contents.txt
Added 01113-86-6099-767 (2 versions)
Working on document /local/janssen-uplib/docs/01113-86-5485-936
  Adding header 'abstract' IT to 01113-86-5485-936
  Adding header 'apparent-mime-type' I to 01113-86-5485-936
  Adding header 'authors' IT to 01113-86-5485-936
  Adding header 'categories' I (paper) to 01113-86-5485-936
  Adding header 'categories' I (sensepad) to 01113-86-5485-936
  Adding header 'citation' I to 01113-86-5485-936
  Adding header 'date' I (20040524) to 01113-86-5485-936
  Adding header 'sha-hash' I to 01113-86-5485-936
  Adding header 'title' IT (Designing Interaction, not Interfaces) to 01113-86-5485-936
  Created empty doc Document<stored/uncompressed,indexed<id:01113-86-5485-936> stored/uncompressed,indexed<uplibdate:20050418>
stored/uncompressed,indexed<uplibtype:whole>>
  Using charset utf8 for contents.txt
  Using language en for contents.txt
    page 0 (3855):  Designing Interaction, not Int
    page 1 (5688):  Figure 1. Interaction as a phe
    page 2 (5831):  Interaction models can be eval
    page 3 (5770):  Reification turns concepts and
    page 4 (5558):  Figure 6. A mock-up of the DPI
    page 5 (5963):  In joint work with Yves Guiard
    page 6 (6819):  I propose making interactions 
    page 7 (5622):  Graphical Application. Proc. A
  Using charset utf8 for contents.txt
  Using language en for contents.txt
Added 01113-86-5485-936 (9 versions)
Working on document /local/janssen-uplib/docs/01113-86-0975-795
  Adding header 'apparent-mime-type' I to 01113-86-0975-795
  Adding header 'categories' I (article) to 01113-86-0975-795
  Adding header 'date' I (20050414) to 01113-86-0975-795
  Adding header 'sha-hash' I to 01113-86-0975-795
  Adding header 'source' IT to 01113-86-0975-795
  Created empty doc Document<stored/uncompressed,indexed<id:01113-86-0975-795> stored/uncompressed,indexed<uplibdate:20050418>
stored/uncompressed,indexed<uplibtype:whole>>
  Using charset utf8 for contents.txt
  Using language en for contents.txt
    page 0 (1851):  About sponsorship Simplifying 
    page 1 (2900):  Latvia and Lithuania, Estonia'
    page 2 (3088):  How much fairness is gained fo
    page 3 (5317):  At the time of its reform, Est
    page 4 (1101):  In part, the tax system is bur
  Using charset utf8 for contents.txt
  Using language en for contents.txt
Added 01113-86-0975-795 (6 versions)
Working on document /local/janssen-uplib/docs/01113-62-2881-882
  Adding header 'apparent-mime-type' I to 01113-62-2881-882
  Adding header 'categories' I (article) to 01113-62-2881-882
  Adding header 'date' I (20050328) to 01113-62-2881-882
  Adding header 'keywords' I (neuroeconomics) to 01113-62-2881-882
  Adding header 'sha-hash' I to 01113-62-2881-882
  Adding header 'title' IT (Neuroeconomics:  Why Logic Often Takes a Backseat) to 01113-62-2881-882
  Created empty doc Document<stored/uncompressed,indexed<id:01113-62-2881-882> stored/uncompressed,indexed<uplibdate:20050415>
stored/uncompressed,indexed<uplibtype:whole>>
  Using charset utf8 for contents.txt
  Using language en for contents.txt
    page 0 (2957):  Close Window MARCH 28, 2005 EC
    page 1 (3856):  these attacks on rationality ?
    page 2 (484):  Even believers in neuroeconomi
  Using charset utf8 for contents.txt
  Using language en for contents.txt
Added 01113-62-2881-882 (4 versions)
Working on document /local/janssen-uplib/docs/01113-44-7730-580
  Adding header 'apparent-mime-type' I to 01113-44-7730-580
  Adding header 'categories' I (flowport) to 01113-44-7730-580
  Adding header 'categories' I (receipt) to 01113-44-7730-580
  Adding header 'categories' I (flowport) to 01113-44-7730-580
  Adding header 'categories' I (flowport) to 01113-44-7730-580
  Adding header 'comment' IT to 01113-44-7730-580
  Adding header 'sha-hash' I to 01113-44-7730-580
  Adding header 'title' IT (fax receipt for JCDL 2005 demo submission document) to 01113-44-7730-580
  Created empty doc Document<stored/uncompressed,indexed<id:01113-44-7730-580> stored/uncompressed,indexed<uplibdate:20050413>
stored/uncompressed,indexed<uplibtype:whole>>
  Using charset utf8 for contents.txt
  Using language en for contents.txt
    page 0 (3383):  Transmission fleport Dare T lo
    page 1 (2697):  ACM Permission and Release For
  Using charset utf8 for contents.txt
  Using language en for contents.txt
Added 01113-44-7730-580 (3 versions)
Working on document /local/janssen-uplib/docs/01113-44-7684-477
  Adding header 'apparent-mime-type' I to 01113-44-7684-477
  Adding header 'authors' IT to 01113-44-7684-477
  Adding header 'categories' I (flowport) to 01113-44-7684-477
  Adding header 'categories' I (article) to 01113-44-7684-477
  Adding header 'categories' I (ebook) to 01113-44-7684-477
  Adding header 'categories' I (flowport) to 01113-44-7684-477
  Adding header 'categories' I (flowport) to 01113-44-7684-477
  Adding header 'citation' I to 01113-44-7684-477
  Adding header 'date' I (20050117) to 01113-44-7684-477
  Adding header 'sha-hash' I to 01113-44-7684-477
  Adding header 'title' IT (Signs of Life for E-books in 2004) to 01113-44-7684-477
  Created empty doc Document<stored/uncompressed,indexed<id:01113-44-7684-477> stored/uncompressed,indexed<uplibdate:20050413>
stored/uncompressed,indexed<uplibtype:whole>>
  Using charset utf8 for contents.txt
  Using language en for contents.txt
    page 0 (6514):  Signs ofLife for E-books in 20
  Using charset utf8 for contents.txt
  Using language en for contents.txt
Added 01113-44-7684-477 (2 versions)
Optimizing...
IW 0 [main]: optimize: index now _94:c8686
IW 0 [main]:   flush: segment=_95 docStoreSegment=_95 docStoreOffset=0 flushDocs=true flushDeletes=false
flushDocStores=true numDocs=26 numBufDelTerms=0
IW 0 [main]:   index before flush _94:c8686

closeDocStore: 2 files to flush to segment _95

flush postings as segment _95 numDocs=26
  oldRAMSize=269104 newFlushedSize=114988 docs/MB=237.094 new/old=42.73%
IW 0 [main]: checkpoint: wrote segments file "segments_ic"
IFD [main]: now checkpoint "segments_ic" [2 segments ; isCommit = true]
IFD [main]: deleteCommits: now remove commit "segments_ib"
IFD [main]: delete "segments_ib"
IW 0 [main]: checkpoint: wrote segments file "segments_id"
IFD [main]: now checkpoint "segments_id" [2 segments ; isCommit = true]
IFD [main]: deleteCommits: now remove commit "segments_ic"
IFD [main]: delete "_95.fnm"
IFD [main]: delete "_95.frq"
IFD [main]: delete "_95.prx"
IFD [main]: delete "_95.tis"
IFD [main]: delete "_95.tii"
IFD [main]: delete "_95.nrm"
IFD [main]: delete "_95.fdx"
IFD [main]: delete "_95.fdt"
IFD [main]: delete "segments_ic"
IW 0 [main]: LMP: findMerges: 2 segments
IW 0 [main]: LMP:   level 6.4501038 to 7.2001038: 1 segments
IW 0 [main]: LMP:   level -1.0 to 5.070234: 1 segments
IW 0 [main]: CMS: now merge
IW 0 [main]: CMS:   index: _94:c8686 _95:c26
IW 0 [main]: CMS:   no more merges pending; now return
IW 0 [main]: add merge to pendingMerges: _94:c8686 _95:c26 [optimize] [total 1 pending]
IW 0 [main]: CMS: now merge
IW 0 [main]: CMS:   index: _94:c8686 _95:c26
IW 0 [main]: CMS:   consider merge _94:c8686 _95:c26 into _96 [optimize]
IW 0 [main]: CMS:     launch new thread [Thread-0]
IW 0 [Thread-0]: CMS:   merge thread: start
IW 0 [main]: CMS:   no more merges pending; now return
IW 0 [Thread-0]: now merge
  merge=_94:c8686 _95:c26 into _96 [optimize]
  index=_94:c8686 _95:c26
IW 0 [Thread-0]: merging _94:c8686 _95:c26 into _96 [optimize]
IW 0 [Thread-0]: merge: total 8712 docs
IW 0 [Thread-0]: hit exception during merge; now refresh deleter on segment _96
IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.fdt"
IFD [Thread-0]: delete "_96.fdt"
IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.fdx"
IFD [Thread-0]: delete "_96.fdx"
IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.fnm"
IFD [Thread-0]: delete "_96.fnm"
IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.frq"
IFD [Thread-0]: delete "_96.frq"
IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.prx"
IFD [Thread-0]: delete "_96.prx"
IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.tii"
IFD [Thread-0]: delete "_96.tii"
IFD [Thread-0]: refresh [prefix=_96]: removing newly created unreferenced file "_96.tis"
IFD [Thread-0]: delete "_96.tis"
IW 0 [Thread-0]: hit exception during merge
java.io.IOException: background merge hit exception: _94:c8686 _95:c26 into _96 [optimize]
	at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1705)
	at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1654)
	at com.parc.uplib.indexing.LuceneIndexing.update(LuceneIndexing.java:414)
	at com.parc.uplib.indexing.LuceneIndexing.main(LuceneIndexing.java:659)
Caused by: org.apache.lucene.index.CorruptIndexException: docs out of order (8692 <= 10221
)
	at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:474)
	at org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:430)
	at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:402)
	at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:366)
	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:123)
	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3002)
	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2751)
	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:240)
Exception in thread "Thread-0" org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.index.CorruptIndexException:
docs out of order (8692 <= 10221 )
	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:274)
Caused by: org.apache.lucene.index.CorruptIndexException: docs out of order (8692 <= 10221
)
	at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:474)
	at org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:430)
	at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:402)
	at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:366)
	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:123)
	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3002)
	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2751)
	at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:240)
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message