lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
Date Wed, 08 May 2019 10:31:00 GMT


Michael McCandless commented on LUCENE-8757:

Whoa, fast iterations over here!

I think there is an important justification for the 2nd criteria (number of segments in each
work unit / slice), which is if you have an index with some large segments, and then with
a long tail of small segments (easily happens if your machine has substantially CPU concurrency
and you use multiple threads), since there is a fixed cost for visiting each segment, if you
put too many small segments into one work unit, those fixed costs multiply and that one work
unit can become too slow even though it's not actually going to visit too many documents.

I think we should keep it?

Re: the choice of the constants – I ran some performance tests quite a while ago on our
production data/queries and a machine with sizable concurrency ({{i3.16xlarge}}) and found
those two constants to be a sweet spot at the time.

But let's also remember: this is simply a default segment -> work units assignment, and
expert users can always continue to override.  Good defaults are important ;)

> Better Segment To Thread Mapping Algorithm
> ------------------------------------------
>                 Key: LUCENE-8757
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Atri Sharma
>            Priority: Major
>         Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
> The current segments to threads allocation algorithm always allocates one thread per
segment. This is detrimental to performance in case of skew in segment sizes since small segments
also get their dedicated thread. This can lead to performance degradation due to context switching
> A better algorithm which is cognizant of size skew would have better performance for
realistic scenarios

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message