lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: "docMap" array in SegmentMergeInfo
Date Wed, 13 Jul 2005 21:31:08 GMT
Lokesh Bajaj wrote:
> For a very large index where we might want to delete/replace some documents, this would
require a lot of memory (for 100 million documents, this would need 381 MB of memory). Is
there any reason why this was implemented this way?

In practice this has not been an issue.  A single index with 100M 
documents is usually quite slow to search.  When collections get this 
big folks tend to instead search multiple indexes in parallel in order 
to keep response times acceptable.  Also, 381Mb of RAM is often not a 
problem for folks with 100M documents.  But this is not to say that it 
could never be a problem.  For folks with limited RAM and/or lots of 
small documents it could indeed be an issue.

> It seems like this could be implemented as a much smaller array that only keeps track
of the deleted document numbers and it would still be very efficient to calculate the new
document number by using this much smaller array. Has this been done by anyone else or been
considered for change in the Lucene code?

Please submit a patch to the java-dev list.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message