lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adrien Grand (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
Date Tue, 21 May 2019 07:38:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844595#comment-16844595
] 

Adrien Grand commented on LUCENE-8757:
--------------------------------------

[~atris] I think it is still not correct since the values of the docBase/maxDoc can only be
seen by the current leaf collector while we need this check across all leaf collectors that
are created from the same collector.

Looking at the AssertingCollector again, it has a check that doc IDs are collected in doc
ID order, so I wonder why this assertion didn't trip with the earlier version of your patch
that sorted leaves by decreasing maxDoc. Maybe we just got lucky? Nevertheless I think it's
worth adding another assertion that leaves are collected in the right order and that their
doc ID space doesn't intersect as described above, eg. we could record a {{previousLeafMaxDoc}}
at the same level as {{maxDoc}} in AssertinCollector, and then in {{getLeafCollector}} do
something like

{code}
assert context.docBase >= previousLeafMaxDoc; // generally equal, but might be greater
if some leaves are skipped
previousLeafMaxDoc = context.docBase + context.reader().maxDoc();
{code}

> Better Segment To Thread Mapping Algorithm
> ------------------------------------------
>
>                 Key: LUCENE-8757
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8757
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Atri Sharma
>            Assignee: Simon Willnauer
>            Priority: Major
>         Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch,
LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one thread per
segment. This is detrimental to performance in case of skew in segment sizes since small segments
also get their dedicated thread. This can lead to performance degradation due to context switching
overheads.
>  
> A better algorithm which is cognizant of size skew would have better performance for
realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message