lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <va...@osafoundation.org>
Subject possible bug with indexing with term vectors
Date Fri, 28 Sep 2007 20:26:34 GMT

On Fri, 28 Sep 2007, Andi Vajda wrote:

> I found a bug with indexing documents that contain fields with Term Vectors. 
> The indexing fails with 'reading past EOF' errors in what seems the index 
> optimizing phase during addIndexes(). (I index first into a RAMDirectory, 
> then addIndexes() into an FSDIrectory).
>
> I have not filed the bug yet formally as I need to isolate the code. If I 
> turn indexing with term vectors off, indexing completes fine.

I tried all morning to isolate the problem but I seem to be unable to 
reproduce it in a simple unit test. In my application, I've been able to get 
errors by doing even less: just creating a FSDirectory and adding documents 
with fields with term vectors fails when optimizing the index with the error 
below. I even tried to add the same documents, in the same order, in the unit 
test but to no avail. It just works.

What is different about my environment ? Well, I'm running PyLucene, but the 
new one, the one using a Apple's Java VM, the same VM I'm using to run the 
unit test. And I'm not doing anything special like calling back into Python or 
something, I'm just calling regular Lucene APIs adding documents into an 
IndexWriter on an FSDirectory using a StandardAnalyzer. If I stop using term 
vectors, all is working fine.

I'd like to get to the bottom of this but could use some help. Does the 
stacktrace below ring a bell ? Is there a way to run the whole indexing and 
optimizing in one single thread ?

Thanks !

Andi..

Exception in thread "Thread-4" 
org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException: read 
past EOF
         at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:263)
Caused by: java.io.IOException: read past EOF
         at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146)
         at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
         at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76)
         at 
org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:207)
         at 
org.apache.lucene.index.SegmentReader.getTermFreqVectors(SegmentReader.java:692)
         at 
org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:279)
         at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:122)
         at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:2898)
         at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2647)
         at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:232)
java.io.IOException: background merge hit exception: _5u:c372 _5v:c5 into _5w 
[optimize]
         at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1621)
         at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:1571)
Caused by: java.io.IOException: read past EOF
         at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:146)
         at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
         at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:76)
         at 
org.apache.lucene.index.TermVectorsReader.get(TermVectorsReader.java:207)
         at 
org.apache.lucene.index.SegmentReader.getTermFreqVectors(SegmentReader.java:692)
         at 
org.apache.lucene.index.SegmentMerger.mergeVectors(SegmentMerger.java:279)
         at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:122)
         at 
org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:2898)
         at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:2647)
         at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:232)

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message