lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-7700) Move throughput control and merge aborting out of IndexWriter's core?
Date Fri, 03 Mar 2017 12:02:45 GMT


Michael McCandless commented on LUCENE-7700:

Nice job fixing a few ancient typos :)

Looks like javadocs for the private {{MergeRateLimiter.maybePause}} method are stale?

Why are we creating {{MergeRateLimiter}} on init of MergeThread and then again in {{CMS.wrapForMerge}}?
 Seems like we could cast the current thread to {{MergeThread}} and get its already-created

Why not {{updateIOThrottle}} in the main CMS thread, not the merge thread?  Else, I think
we have an added delay, from when a backlog'd merge shows up, to when the already running
merge threads kick up their IO throttle?

Maybe add a comment to {{OneMergeProgress.owner}} and {{.setMergeThread}} that it's only used
for catching misuse?

Can we rename {{OneMergeProgress.pauseTimes}} -> {{pauseTimesNanos}} or NS?

You can just remove the //private final Directory mergeDirectory from IW.

Hmm it looks like CFS building is still unthrottled?

> Move throughput control and merge aborting out of IndexWriter's core?
> ---------------------------------------------------------------------
>                 Key: LUCENE-7700
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Minor
>         Attachments: LUCENE-7700.patch, LUCENE-7700.patch
> Here is a bit of a background:
> - I wanted to implement a custom merging strategy that would have a custom i/o flow control
> - currently, the CMS is tightly bound with a few classes -- MergeRateLimiter, OneMerge,
> Looking at the code it seems to me that everything with respect to I/O control could
be nicely pulled out into classes that explicitly control the merging process, that is only
MergePolicy and MergeScheduler. By default, one could even run without any additional I/O
accounting overhead (which is currently in there, even if one doesn't use the CMS's throughput
> Such refactoring would also give a chance to nicely move things where they belong --
job aborting into OneMerge (currently in RateLimiter), rate limiter lifecycle bound to OneMerge
(MergeScheduler could then use per-merge or global accounting, as it pleases).
> Just a thought and some initial refactorings for discussion.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message