lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Atri Sharma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8757) Better Segment To Thread Mapping Algorithm
Date Tue, 21 May 2019 13:48:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844848#comment-16844848
] 

Atri Sharma commented on LUCENE-8757:
-------------------------------------

[~jpountz] Essentially, the idea is to maintain the previous leaf's maxDoc outside the scope
of per leaf collector and move it to AssertingCollector's state, right? 

If I understood you correctly, attached patch should fix this. I verified that the test the
previous iteration added specifically for the out of order docIDs catches this issue, but
agree that AssertingCollector should have the right assertions in place.

 
{quote}Looking at the AssertingCollector again, it has a check that doc IDs are collected
in doc ID order, so I wonder why this assertion didn't trip with the earlier version of your
patch that sorted leaves by decreasing maxDoc. Maybe we just got lucky? 
{quote}
Do you think similar assertions/checks would make sense in IndexSearcher too? If AssertingCollector
missed this issue, maybe we should make IndexSearcher's input arguments validation more robust
as well. WDYT?

> Better Segment To Thread Mapping Algorithm
> ------------------------------------------
>
>                 Key: LUCENE-8757
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8757
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Atri Sharma
>            Assignee: Simon Willnauer
>            Priority: Major
>         Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch,
LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one thread per
segment. This is detrimental to performance in case of skew in segment sizes since small segments
also get their dedicated thread. This can lead to performance degradation due to context switching
overheads.
>  
> A better algorithm which is cognizant of size skew would have better performance for
realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message