Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 18078 invoked from network); 13 Apr 2002 23:38:41 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 13 Apr 2002 23:38:41 -0000 Received: (qmail 17423 invoked by uid 97); 13 Apr 2002 23:38:46 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 17407 invoked by uid 97); 13 Apr 2002 23:38:46 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 17396 invoked from network); 13 Apr 2002 23:38:45 -0000 Message-Id: <200204132339.g3DNd8401871@localhost.localdomain> Content-Type: text/plain; charset="iso-8859-1" From: Adam Ratcliffe To: lucene-user@jakarta.apache.org Subject: XMLDirectory for index storage as XML fails during segment merge Date: Sun, 14 Apr 2002 11:39:07 +1200 X-Mailer: KMail [version 1.3.1] MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N I'm using lucene-1.2-rc4 to index XML elements stored in a native XML database. An Index class is used to generate an index for a set of nodes that is then persisted to the DOM tree. Following the pattern of the RAMDirectory class I've created an XMLDirectory that has operations for writing index files as elements and others for reading index content given an element representing an index file. This class works fine on test datasets but fails after approx. 3000 documents have been indexed. An EOF error is thrown during the refill() operation of the InputStream when the segments are being merged, I've attached the relevant stack-trace below. [Index.java:78] java.io.IOException: read past EOF java.io.IOException: read past EOF at org.apache.lucene.store.InputStream.refill(Unknown Source) at org.apache.lucene.store.InputStream.readByte(Unknown Source) at org.apache.lucene.store.InputStream.readChars(Unknown Source) at org.apache.lucene.store.InputStream.readString(Unknown Source) at org.apache.lucene.index.FieldsReader.doc(Unknown Source) at org.apache.lucene.index.SegmentReader.document(Unknown Source) at org.apache.lucene.index.SegmentMerger.mergeFields(Unknown Source) at org.apache.lucene.index.SegmentMerger.merge(Unknown Source) at org.apache.lucene.index.IndexWriter.mergeSegments(Unknown Source) at org.apache.lucene.index.IndexWriter.maybeMergeSegments(Unknown Source) at org.apache.lucene.index.IndexWriter.addDocument(Unknown Source) at com.parochus.search.Index.create(Index.java:72) at com.parochus.search.TstIndex.main(TstIndex.java:43) I've reviewed the Lucene and JGuru FAQs and conclude that Lucene should be comfortable with indexing millions of documents within a single index so this error wouldn't appear to be due to any upper-limit of Lucene's handling capablility being reached. I'm wondering if this is indicating that the length count for the segment is not matching up with the number of bytes actually written for it. Anybody got any ideas for tests that I could run that would determine if this is the cause of the problem? Regards, Adam Ratcliffe adam@prema.co.nz -- To unsubscribe, e-mail: For additional commands, e-mail: