lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vince Taluskie <vi...@taluskie.com>
Subject negative number of docs?
Date Wed, 24 Sep 2003 19:04:59 GMT

Hello,

I'm using lucene for a legacy records project and up till now things have 
worked very well.  The last round of additions I've made to the 
largest index looks like it's hit a limit/bug.

I'm running across a problem with lucene v1.2 involving numDocs()  
returning a negative number of documents (which is causing hits.length()
to throw an exception) after merging several large indexes together.

In this project, there are 11 types of legacy data reports - they are put 
into a fielded format where each row of data becomes a document.  Data 
from multiple divisions is indexed and then merged together into a single 
index for each report type.  

The index sizes are typicaly about 75M documents but the largest had 242M
documents before the latest update.  The latest merge of two intermediary
indexes at 195M docs and 96M docs should have put it at 291M documents -
the merge process (which uses the IndexWriter.addIndexes() call) ran
without any errors and the final index size looks correct but when I
attempt to check the number of documents in it it returns a negative
number.  

Before:

Index /rr/all_indexes/SL contains 242582695 documents

After:

Index /rr/tmpindexes/global/SL contains -245430166 documents

Attempts to perform seaches on this index cause exceptions when the hits
object is returned by the IndexSearcher.search() function.  The trace
looks like:

11:53:36,377 ERROR [Engine] StandardWrapperValve[RRSearcher]: 
Servlet.service() 
for servlet RRSearcher threw exception
java.lang.NegativeArraySizeException
        at org.apache.lucene.index.SegmentReader.norms(Unknown Source)
        at org.apache.lucene.search.TermQuery.scorer(Unknown Source)
        at org.apache.lucene.search.BooleanQuery.scorer(Unknown Source)
        at org.apache.lucene.search.Query.scorer(Unknown Source)
        at org.apache.lucene.search.IndexSearcher.search(Unknown Source)
        at org.apache.lucene.search.Hits.getMoreDocs(Unknown Source)
        at org.apache.lucene.search.Hits.<init>(Unknown Source)
        at org.apache.lucene.search.Searcher.search(Unknown Source)
        at com.cexp.ta.rec_retention.RRSearcher.doPost(RRSearcher.java:347)


I figured I would be fine with the number of documents upto the 2-4B
range - and the data uploads for the project are finished so the indexes
shouldn't need to get larger after this but it looks like I've hit a 
limit between 242M-291M documents.   Should I file a bug on this?  

Recommendations?  I could arbitrarily split the indexes or re-index with 
1.3 if the limit is fixed there - but the simplicity of the unified search 
is a real plus.

Vince


Mime
View raw message