lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Keegan <>
Subject Re: "docMap" array in SegmentMergeInfo
Date Tue, 11 Oct 2005 22:14:49 GMT
On a multi-cpu system, this loop to build the docMap array can cause severe
thread thrashing because of the synchronized method 'isDeleted'. I have
observed this on an index with over 1 million documents (which contains a
few thousand deleted docs) when multiple threads perform a search with
either a sort field or a range query. A stack dump shows all threads here:

waiting for monitor entry [0x6d2cf000..0x6d2cfd6c] at
org.apache.lucene.index.SegmentReader.isDeleted( -
waiting to lock <0x04e40278>

The performances worsens as the number of threads increases. The searches
may take minutes to complete.
If only a single thread issues the search, it completes fairly quickly. I
also noticed from looking at the code that the docMap doesn't appear to be
used in these cases. It seems only to be used for merging segments. If the
index is in 'search/read-only' mode, is there a way around this bottleneck?


On 7/13/05, Doug Cutting <> wrote:
> Lokesh Bajaj wrote:
> > For a very large index where we might want to delete/replace some
> documents, this would require a lot of memory (for 100 million documents,
> this would need 381 MB of memory). Is there any reason why this was
> implemented this way?
> In practice this has not been an issue. A single index with 100M
> documents is usually quite slow to search. When collections get this
> big folks tend to instead search multiple indexes in parallel in order
> to keep response times acceptable. Also, 381Mb of RAM is often not a
> problem for folks with 100M documents. But this is not to say that it
> could never be a problem. For folks with limited RAM and/or lots of
> small documents it could indeed be an issue.
> > It seems like this could be implemented as a much smaller array that
> only keeps track of the deleted document numbers and it would still be very
> efficient to calculate the new document number by using this much smaller
> array. Has this been done by anyone else or been considered for change in
> the Lucene code?
> Please submit a patch to the java-dev list.
> Doug
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message