lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: lucene-core-2.2.0.jar broken? CorruptIndexException?
Date Thu, 29 Nov 2007 18:52:53 GMT
Are you still getting the original exception too or just the Array out  
of bounds one now?  Also, are you doing anything else to the index  
while this is happening?  The code at the point in the exception below  
is trying to properly handle deleted documents.

-Grant

On Nov 29, 2007, at 1:34 PM, Bill Janssen wrote:

>> Can you try running with the trunk version of Lucene (2.3-dev) and  
>> see
>> if the error still occurs?  EG you can download this AM's build here:
>>
>>  http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/288/artifact/artifacts
>
> Still there.  Here's the dump with last night's build:
>
> /Library/Java/Home/bin/java '- 
> Dcom.parc.uplib.indexing.debugMode=true' '- 
> Dcom.parc.uplib.indexing.indexProperties=contents:title:categories 
> $,*:date@:apparent-mime-type*:authors$\sand\s:comment:abstract:email- 
> message-id*:email-guid*:email-subject:email-from-name:email-from- 
> address*:email-attachment-to*:email-thread-index*:email-references 
> $,*:email-in-reply-to$,*:keywords$,*:album:performer:composer:music- 
> genre*:audio-length:accompaniment:paragraph-ids$,*:sha-hash*' - 
> classpath "/local/uplib/share/UpLib-1.7/code/lucene- 
> core-2.3-2007-11-29_02-49-31.jar:/local/uplib/share/UpLib-1.7/code/ 
> LuceneIndexing.jar" -Dorg.apache.lucene.writeLockTimeout=20000  
> com.parc.uplib.indexing.LuceneIndexing "/local/janssen-uplib/index"  
> update /local/janssen-uplib/docs 01179-00-0750-547 01178-90-9186-558  
> 01178-81-4212-772 01178-81-3305-217 01178-73-1029-141  
> 01178-72-8365-803
> updating
> doc_root_dir is /local/janssen-uplib/docs
> IFD [main]: setInfoStream  
> deletionPolicy 
> =org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@462851
> IW 0 [main]: setInfoStream: dir=org.apache.lucene.store.FSDirectory@/ 
> local/janssen-uplib/index autoCommit=true  
> mergePolicy=org.apache.lucene.index.LogByteSizeMergePolicy@c56c60  
> mergeScheduler 
> =org.apache.lucene.index.ConcurrentMergeScheduler@4e280c  
> ramBufferSizeMB=16.0 maxBuffereDocs=-1 maxBuffereDeleteTerms=-1  
> maxFieldLength=10000 index=_21:c19686 _22:c92
> IW 0 [main]: setMaxFieldLength 2147483647
> Working on document /local/janssen-uplib/docs/01179-00-0750-547
>  Adding header 'abstract' IT to 01179-00-0750-547
>  Adding header 'apparent-mime-type' I to 01179-00-0750-547
>  Adding header 'authors' IT to 01179-00-0750-547
>  Adding header 'categories' I (ebooks) to 01179-00-0750-547
>  Adding header 'categories' I (economics) to 01179-00-0750-547
>  Adding header 'categories' I (paper) to 01179-00-0750-547
>  Adding header 'citation' I to 01179-00-0750-547
>  Adding header 'date' I (20070128) to 01179-00-0750-547
>  Adding header 'sha-hash' I to 01179-00-0750-547
>  Adding header 'title' IT (Heterogeneity in Price Stickiness and the  
> Real Effects of Monetary Shocks) to 01179-00-0750-547
>  Created empty doc Document<stored/uncompressed,indexed<id: 
> 01179-00-0750-547> stored/uncompressed,indexed<uplibdate:20070512>  
> stored/uncompressed,indexed<uplibtype:whole>>
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
>    page 0 (2181):  Heterogeneity in Price Stickin
>    page 1 (2927):  1 Introduction There is ample
>    page 2 (3135):  In the presence of strategic c
>    page 3 (3128):  Motivated by those questions,
>    page 4 (3214):  ploring the tractability of th
>    page 5 (2491):  model with Taylor staggered wa
>    page 6 (1548):  real rigidities (Ball and Rome
>    page 7 (3098):  2.2 Calibrating the sectoral d
>    page 8 (1913):  distribution of price stickine
>    page 9 (1952):  reported in Table 1. Hencefort
>    page 10 (1635):  Figure 2 presents analogous re
>    page 11 (1743):  In the absence of strategic co
>    page 12 (2806):  Corollary 1 For an arbitrary h
>    page 13 (2380):  2.4.2 Growth rate shocks In th
>    page 14 (2962):  price changes. With heterogene
>    page 15 (3265):  ties and heterogeneity in the
>    page 16 (1962):  complementarities. The results
>    page 17 (751):  to the response of the heterog
>    page 18 (489):  economies are embedded into th
>    page 19 (3295):  2.6 Fitting IRFs with an ident
>    page 20 (2066):  Table 3a: Best-Fitting Duratio
>    page 21 (2444):  This is an important step beca
>    page 22 (1976):  where ? is the discount factor
>    page 23 (1183):  Et "? Ct+1 Ct ¦?? It Pt Pt+1 #
>    page 24 (2188):  can be rewritten as: Pk,t = £
>    page 25 (1370):  pt = Z 1 0 f (k) pk,tdk, (10)
>    page 26 (3269):  Heterogeneity in price stickin
>    page 27 (3117):  Irrespective of the net effect
>    page 28 (2084):  set of parameters involve high
>    page 29 (575):  0 5 10 15 20 25 30 35 40 0 x 1
>    page 30 (2185):  output and falling prices in a
>    page 31 (2358):  price changes that minimizes t
>    page 32 (2689):  These results are fully consis
>    page 33 (3600):  different sources of real rigi
>    page 34 (3168):  work in a model with heterogen
>    page 35 (2557):  single equation estimation of
>    page 36 (1326):  Taking the limit as Æ ? 0 in e
>    page 37 (1796):  The output gap is constant at
>    page 38 (1066):  The corresponding path for the
>    page 39 (1347):  4) Proof of Corollaries 1 and
>    page 40 (2421):  Therefore, for ? Å 0, the expe
>    page 41 (1343):  p (t) = Z 1 0 f (k) ? ?? ?? R
>    page 42 (2117):  As ? ? 0, this clearly converg
>    page 43 (1375):  model around the zero inflatio
>    page 44 (1497):  pt = Z 1 0 f (k) pk,tdk, yt =
>    page 45 (1128):  Table A.3: Best-Fitting Durati
>    page 46 (898):  Multiplying by f (k) ?k and in
>    page 47 (1072):  Now, from (23): ?kxk,t = pk,t
>    page 48 (268):  Finally, let ¹t ? pt ? pt?1 de
>    page 49 (1694):  References [1] Altissimo, F.,
>    page 50 (1874):  [14] Bils, M., P. Klenow and O
>    page 51 (2091):  [27] Carlton, D. (1986), ÒThe
>    page 52 (1846):  [39] Dixon, H. and E. Kara (20
>    page 53 (1530):  [51] Ohanian, L., A. Stockman
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
> Added 01179-00-0750-547 (55 versions)
> Working on document /local/janssen-uplib/docs/01178-90-9186-558
>  Adding header 'abstract' IT to 01178-90-9186-558
>  Adding header 'apparent-mime-type' I to 01178-90-9186-558
>  Adding header 'authors' IT to 01178-90-9186-558
>  Adding header 'authors' IT to 01178-90-9186-558
>  Adding header 'authors' IT to 01178-90-9186-558
>  Adding header 'authors' IT to 01178-90-9186-558
>  Adding header 'authors' IT to 01178-90-9186-558
>  Adding header 'categories' I (paper) to 01178-90-9186-558
>  Adding header 'categories' I (ebooks) to 01178-90-9186-558
>  Adding header 'date' I (20050500) to 01178-90-9186-558
>  Adding header 'sha-hash' I to 01178-90-9186-558
>  Adding header 'title' IT (Visual-Syntactic Text Formatting: A New  
> Method to Enhance Online) to 01178-90-9186-558
>  Created empty doc Document<stored/uncompressed,indexed<id: 
> 01178-90-9186-558> stored/uncompressed,indexed<uplibdate:20070511>  
> stored/uncompressed,indexed<uplibtype:whole>>
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
>    page 0 (3332):  Visual-Syntactic Text Formatti
>    page 1 (1461):  into this: To make these chang
>    page 2 (4620):  complex than a simple, concate
>    page 3 (4723):  Among some poorer readers, met
>    page 4 (3827):  method does not extract or dir
>    page 5 (3388):  (ACT scores) with gains in com
>    page 6 (3693):  digital text actually improve
>    page 7 (3400):  exam were administered. For re
>    page 8 (3413):  Intermediate and long-term ret
>    page 9 (4219):  Student preference and survey
>    page 10 (4417):  Discussion In print media, sim
>    page 11 (4358):  time; however, the opposite tr
>    page 12 (3950):  More time spent actually readi
>    page 13 (4184):  In education, the VSTF method
>    page 14 (3270):  References Armbruster, B.B. (2
>    page 15 (3176):  April 19Ð20). Neuroimaging, la
>    page 16 (3350):  Klare, G.R., Nichols, W.H., &
>    page 17 (4098):  and narrative skills connect w
>    page 18 (3070):  He has conducted laboratory-ba
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
> Added 01178-90-9186-558 (20 versions)
> Working on document /local/janssen-uplib/docs/01178-81-4212-772
>  Adding header 'abstract' IT to 01178-81-4212-772
>  Adding header 'apparent-mime-type' I to 01178-81-4212-772
>  Adding header 'categories' I (newspaper) to 01178-81-4212-772
>  Adding header 'categories' I (article) to 01178-81-4212-772
>  Adding header 'categories' I (fun) to 01178-81-4212-772
>  Adding header 'categories' I (historical) to 01178-81-4212-772
>  Adding header 'date' I (19340429) to 01178-81-4212-772
>  Adding header 'sha-hash' I to 01178-81-4212-772
>  Adding header 'title' IT (Gigantic Robots, Controlled by Wireless,  
> to Fight Our Battles) to 01178-81-4212-772
>  Created empty doc Document<stored/uncompressed,indexed<id: 
> 01178-81-4212-772> stored/uncompressed,indexed<uplibdate:20070510>  
> stored/uncompressed,indexed<uplibtype:whole>>
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
>    page 0 (261):  iganlic obok ntroIIed b ireles
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
> Added 01178-81-4212-772 (2 versions)
> Working on document /local/janssen-uplib/docs/01178-81-3305-217
>  Adding header 'apparent-mime-type' I to 01178-81-3305-217
>  Adding header 'authors' IT to 01178-81-3305-217
>  Adding header 'categories' I (cartoon) to 01178-81-3305-217
>  Adding header 'keywords' I (incentive) to 01178-81-3305-217
>  Adding header 'sha-hash' I to 01178-81-3305-217
>  Created empty doc Document<stored/uncompressed,indexed<id: 
> 01178-81-3305-217> stored/uncompressed,indexed<uplibdate:20070510>  
> stored/uncompressed,indexed<uplibtype:whole>>
> Added 01178-81-3305-217 (1 versions)
> Working on document /local/janssen-uplib/docs/01178-73-1029-141
>  Adding header 'apparent-mime-type' I to 01178-73-1029-141
>  Adding header 'authors' IT to 01178-73-1029-141
>  Adding header 'categories' I (article) to 01178-73-1029-141
>  Adding header 'date' I (20070514) to 01178-73-1029-141
>  Adding header 'sha-hash' I to 01178-73-1029-141
>  Adding header 'title' IT (Critical Mass:  Everyone listens to  
> Walter Mossberg) to 01178-73-1029-141
>  Created empty doc Document<stored/uncompressed,indexed<id: 
> 01178-73-1029-141> stored/uncompressed,indexed<uplibdate:20070509>  
> stored/uncompressed,indexed<uplibtype:whole>>
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
>    page 0 (328):  Go Back Print this page Skip t
>    page 1 (1024):  Mossberg assesses technology p
>    page 2 (4281):  that Mossberg has since descri
>    page 3 (2610):  Titus said, ÒIt will come with
>    page 4 (4412):  ÒWeÕd love that,Ó Mermelstein
>    page 5 (4035):  half years, and then transferr
>    page 6 (4952):  partly to the clout of the new
>    page 7 (5049):  Mossberg will often be the fir
>    page 8 (4330):  Of the blogs that review produ
>    page 9 (31):  ¥ Del.icio.us ¥ Reddit Object
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
> Added 01178-73-1029-141 (11 versions)
> Working on document /local/janssen-uplib/docs/01178-72-8365-803
>  Adding header 'apparent-mime-type' I to 01178-72-8365-803
>  Adding header 'authors' IT to 01178-72-8365-803
>  Adding header 'categories' I (ebooks) to 01178-72-8365-803
>  Adding header 'categories' I (article) to 01178-72-8365-803
>  Adding header 'date' I (20061104) to 01178-72-8365-803
>  Adding header 'sha-hash' I to 01178-72-8365-803
>  Adding header 'title' IT (Selling Ebooks on the Web via MHTML) to  
> 01178-72-8365-803
>  Created empty doc Document<stored/uncompressed,indexed<id: 
> 01178-72-8365-803> stored/uncompressed,indexed<uplibdate:20070509>  
> stored/uncompressed,indexed<uplibtype:whole>>
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
>    page 0 (3499):  To: ebook-community@yahoogroup
>    page 1 (2904):  the IETF world, to be the resp
>  Using charset utf8 for contents.txt
>  Using language en for contents.txt
> Added 01178-72-8365-803 (3 versions)
> Optimizing...
> IW 0 [main]: optimize: index now _21:c19686 _22:c92
> IW 0 [main]:   flush: segment=_23 docStoreSegment=_23  
> docStoreOffset=0 flushDocs=true flushDeletes=true  
> flushDocStores=true numDocs=92 numBufDelTerms=6
> IW 0 [main]:   index before flush _21:c19686 _22:c92
>
> closeDocStore: 2 files to flush to segment _23
>
> flush postings as segment _23 numDocs=92
>  oldRAMSize=528092 newFlushedSize=263152 docs/MB=366.59 new/ 
> old=49.831%
> IW 0 [main]: flush 6 buffered deleted terms on 3 segments.
> flushed 92 deleted documents
> IW 0 [main]: checkpoint: wrote segments file "segments_48"
> IFD [main]: now checkpoint "segments_48" [3 segments ; isCommit =  
> true]
> IFD [main]: deleteCommits: now remove commit "segments_47"
> IFD [main]: delete "segments_47"
> IW 0 [main]: checkpoint: wrote segments file "segments_49"
> IFD [main]: now checkpoint "segments_49" [3 segments ; isCommit =  
> true]
> IFD [main]: deleteCommits: now remove commit "segments_48"
> IFD [main]: delete "_23.fnm"
> IFD [main]: delete "_23.frq"
> IFD [main]: delete "_23.prx"
> IFD [main]: delete "_23.tis"
> IFD [main]: delete "_23.tii"
> IFD [main]: delete "_23.nrm"
> IFD [main]: delete "_23.fdx"
> IFD [main]: delete "_23.fdt"
> IFD [main]: delete "segments_48"
> IW 0 [main]: LMP: findMerges: 3 segments
> IW 0 [main]: LMP:   level 6.744677 to 7.494677: 1 segments
> IW 0 [main]: LMP:   level -1.0 to 5.513348: 2 segments
> IW 0 [main]: CMS: now merge
> IW 0 [main]: CMS:   index: _21:c19686 _22:c92 _23:c92
> IW 0 [main]: CMS:   no more merges pending; now return
> IW 0 [main]: add merge to pendingMerges: _21:c19686 _22:c92 _23:c92  
> [optimize] [total 1 pending]
> IW 0 [main]: CMS: now merge
> IW 0 [main]: CMS:   index: _21:c19686 _22:c92 _23:c92
> IW 0 [main]: CMS:   consider merge _21:c19686 _22:c92 _23:c92 into  
> _24 [optimize]
> IW 0 [main]: CMS:     launch new thread [Thread-0]
> IW 0 [Thread-0]: CMS:   merge thread: start
> IW 0 [main]: CMS:   no more merges pending; now return
> IW 0 [Thread-0]: now merge
>  merge=_21:c19686 _22:c92 _23:c92 into _24 [optimize]
>  index=_21:c19686 _22:c92 _23:c92
> IW 0 [Thread-0]: merging _21:c19686 _22:c92 _23:c92 into _24  
> [optimize]
> IW 0 [Thread-0]: merge: total 19686 docs
> IW 0 [Thread-0]: hit exception during merge; now refresh deleter on  
> segment _24
> IFD [Thread-0]: refresh [prefix=_24]: removing newly created  
> unreferenced file "_24.fdt"
> IFD [Thread-0]: delete "_24.fdt"
> IFD [Thread-0]: refresh [prefix=_24]: removing newly created  
> unreferenced file "_24.fdx"
> IFD [Thread-0]: delete "_24.fdx"
> IFD [Thread-0]: refresh [prefix=_24]: removing newly created  
> unreferenced file "_24.fnm"
> IFD [Thread-0]: delete "_24.fnm"
> IFD [Thread-0]: refresh [prefix=_24]: removing newly created  
> unreferenced file "_24.frq"
> IFD [Thread-0]: delete "_24.frq"
> IFD [Thread-0]: refresh [prefix=_24]: removing newly created  
> unreferenced file "_24.prx"
> IFD [Thread-0]: delete "_24.prx"
> IFD [Thread-0]: refresh [prefix=_24]: removing newly created  
> unreferenced file "_24.tii"
> IFD [Thread-0]: delete "_24.tii"
> IFD [Thread-0]: refresh [prefix=_24]: removing newly created  
> unreferenced file "_24.tis"
> IFD [Thread-0]: delete "_24.tis"
> IW 0 [Thread-0]: hit exception during merge
> Exception in thread "Thread-0" org.apache.lucene.index.MergePolicy 
> $MergeException: java.lang.ArrayIndexOutOfBoundsException: Array  
> index out of range: 20672
> 	at org.apache.lucene.index.ConcurrentMergeScheduler 
> $MergeThread.run(ConcurrentMergeScheduler.java:274)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out  
> of range: 20672
> 	at org.apache.lucene.util.BitVector.get(BitVector.java:72)
> 	at  
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:118)
> 	at  
> org 
> .apache 
> .lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:95)
> 	at  
> org 
> .apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java: 
> 467)
> 	at  
> org 
> .apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java: 
> 430)
> 	at  
> org 
> .apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java: 
> 402)
> 	at  
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java: 
> 366)
> 	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java: 
> 123)
> 	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 
> 3002)
> 	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2751)
> 	at org.apache.lucene.index.ConcurrentMergeScheduler 
> $MergeThread.run(ConcurrentMergeScheduler.java:240)
> java.io.IOException: background merge hit exception: _21:c19686  
> _22:c92 _23:c92 into _24 [optimize]
> 	at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java: 
> 1705)
> 	at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java: 
> 1654)
> 	at  
> com.parc.uplib.indexing.LuceneIndexing.update(LuceneIndexing.java:419)
> 	at com.parc.uplib.indexing.LuceneIndexing.main(LuceneIndexing.java: 
> 664)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out  
> of range: 20672
> 	at org.apache.lucene.util.BitVector.get(BitVector.java:72)
> 	at  
> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:118)
> 	at  
> org 
> .apache 
> .lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:95)
> 	at  
> org 
> .apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java: 
> 467)
> 	at  
> org 
> .apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java: 
> 430)
> 	at  
> org 
> .apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java: 
> 402)
> 	at  
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java: 
> 366)
> 	at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java: 
> 123)
> 	at org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java: 
> 3002)
> 	at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2751)
> 	at org.apache.lucene.index.ConcurrentMergeScheduler 
> $MergeThread.run(ConcurrentMergeScheduler.java:240)
> janssen-home : /u 75 %
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
http://lucene.grantingersoll.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message