lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Ratcliffe <>
Subject XMLDirectory for index storage as XML fails during segment merge
Date Sat, 13 Apr 2002 23:39:07 GMT
I'm using lucene-1.2-rc4 to index XML elements stored in a native XML 
database. An Index class is used to generate an index for a set of nodes that 
is then persisted to the DOM tree.

Following the pattern of the RAMDirectory class I've created an XMLDirectory 
that has operations for writing index files as elements and others for 
reading index content given an element representing an index file.

This class works fine on test datasets but fails after approx. 3000 documents 
have been indexed.

An EOF error is thrown during the refill() operation of the InputStream when 
the segments are being merged, I've attached the relevant stack-trace below.

[] read past EOF read past EOF
	at Source)
	at Source)
	at Source)
	at Source)
	at org.apache.lucene.index.FieldsReader.doc(Unknown Source)
	at org.apache.lucene.index.SegmentReader.document(Unknown Source)
	at org.apache.lucene.index.SegmentMerger.mergeFields(Unknown Source)
	at org.apache.lucene.index.SegmentMerger.merge(Unknown Source)
	at org.apache.lucene.index.IndexWriter.mergeSegments(Unknown Source)
	at org.apache.lucene.index.IndexWriter.maybeMergeSegments(Unknown Source)
	at org.apache.lucene.index.IndexWriter.addDocument(Unknown Source)

I've reviewed the Lucene and JGuru FAQs and conclude that Lucene should be 
comfortable with indexing millions of documents within a single index so this 
error wouldn't appear to be due to any upper-limit of Lucene's handling 
capablility being reached.

I'm wondering if this is indicating that the length count for the segment is 
not matching up with the number of bytes actually written for it.  

Anybody got any ideas for tests that I could run that would determine if this 
is the cause of the problem?

Adam Ratcliffe

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message