lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Berkovitz <jberkov...@ruckusnetwork.com>
Subject Read past EOF and negative bufferLength problem (1.4 rc2)
Date Tue, 27 Apr 2004 19:00:15 GMT
Using Lucene 1.4 rc2 I've run into a fatal problem: certain 
PhraseQueries cause a "Read Past EOF" exception (see below), while other 
PhraseQueries enter an infinite loop due to a negative bufferLength 
field in CSInputStream.  Environment is WinXP, JDK 1.4.2.  The index is 
large, incorporating 1,000,000 documents each of which has 3 stored, 
indexed fields of 10-100 chars.

The problem does not occur with Lucene 1.3 indexing the exact same set 
of Documents.  Nor does it occur with 1.4 rc2 using various smaller sets 
of documents.  Right now my workaround is to use Lucene 1.3.

For the PhraseQuery "a y" (that's right, two single-letter terms), the 
read-past-EOF exception is as follows:

java.io.IOException: read past EOF
    at org.apache.lucene.store.InputStream.refill(InputStream.java:154)
    at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
    at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
    at 
org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:59)
    at 
org.apache.lucene.index.SegmentTermDocs.skipTo(SegmentTermDocs.java:187)
    at 
org.apache.lucene.search.PhrasePositions.skipTo(PhrasePositions.java:47)
    at org.apache.lucene.search.PhraseScorer.next(PhraseScorer.java:69)
    at org.apache.lucene.search.Scorer.score(Scorer.java:37)
    at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:81)
    at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64)
    at org.apache.lucene.search.Hits.<init>(Hits.java:43)
    at org.apache.lucene.search.Searcher.search(Searcher.java:33)
    at org.apache.lucene.search.Searcher.search(Searcher.java:27)
    at...

For the phrase query "z y", an  infinite loop is entered.  The loop 
occurs due to a similar condition to read-past-EOF: at line 153 of 
org.apache.lucene.store.InputStream, the value of bufferLength goes 
negative due to the value of start exceeding the value of end.  This in 
turn seems to be a consequence of a seek to a position past the end of 
the stream.

Something is clearly corrupt somewhere in the index structure.  I'd love 
to post the files that reproduce the problem, but it's about 100 MB of 
data.  If someone on the Lucene dev team wants to give me an upload 
destination, I can post the index somewhere and you can play with the 
problem.

regards and thanks for any assistance,

Joe Berkovitz
Chief Architect
Ruckus Network, Inc.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message