lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: How to use concurrency efficiently
Date Tue, 02 Apr 2013 14:06:53 GMT
On Tue, Apr 2, 2013 at 2:29 PM, Igor Shalyminov
<ishalyminov@yandex-team.ru> wrote:
> Hello!

Hi Igor,

> I have a ~20GB index and try to make a concurrent search over it.
> The index has 16 segments, I run SpanQuery.getSpans() on each segment concurrently.
> I see really small performance improvement of searching concurrently. I suppose, the
reason is that the sizes of the segments are very non-uniform (3 segments have ~20 000 docs
each, and the others have less than 1 000 each).
> How to make more uniformly sized segments (I now use just writer.forceMerge(16)), and
are multiple index segments the most important thing in Lucene concurrency?

Segments have non uniform sizes by design. A segment is generated
every time a flush happens (when the ram buffer is full or if you
explicitely call commit). When there are two many segments, Lucene
merges some of them while new segments keep being generated as you add
data. So the "flush" segments will always be small while segments
resulting from a merge will be much larger since they contain data
from several other segments.

Even if segments are collected concurrently, IndexSearcher needs to
merge the results of the collection of each segments in the end. Since
your segments are very small (20000 docs), maybe the cost of
initialization/merge is not negligible compared to single-segment
collection.

--
Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message