lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2755) Some improvements to CMS
Date Mon, 15 Nov 2010 19:38:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932162#action_12932162
] 

Michael McCandless commented on LUCENE-2755:
--------------------------------------------

bq. We drop IW.getNextMerge, MS.merge(IW), and replace them with MS.scheduleMerge(MP.OM),
so instead of IW asking MS to pull all merges from itself, it simply pushes them.

That sounds like a great simplification!

bq. I'm not against sorting merges, it's so simple, even if useless. Though maybe it's better
to use Comparator, so you can redefine the order? Pausing large merges is another issue -
that's a freakload of complexity for zero gain.

Pausing large merges is (unfortunately) important for full use of available concurrency. 
Otherwise, when a laaarge merge is taking place, it causes to to fully stop your indexing
threads unnecessarily.  Turn on infoStream when building a large index and you'll see...

An OS CPU scheduler will lower the priority of long-running CPU hogging processes, for the
same reason (so that newly started CPU hog processes that are short running get nearly 100%
of the CPU so they finish fast).  It's just that we don't have the "freedom" to allow an unbounded
number of merges that we must "approximate" this by explicitly pausing the long running merges.

bq. Also, let's kill this weeeird IW.mergeInit that is called from CMS, but not SMS

There was some reason why this needed to be called by CMS but not SMS but I can't remember
why.  (It's re-called by IW.merge in case the MS didn't already call it).  But it'd be great
to not call it from CMS if it's not necessary... I can't remember the reason.

bq. With introduction of executors, and SMS being folded as a special case of CMS, we might
as well drop MS completely and move what little code is left straight to IW, which will now
accept an executor.

That's tempting... but people use MSs eg to schedule big merges at different times.  I don't
think we should outright drop MS.

> Some improvements to CMS
> ------------------------
>
>                 Key: LUCENE-2755
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2755
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> While running optimize on a large index, I've noticed several things that got me to read
CMS code more carefully, and find these issues:
> * CMS may hold onto a merge if maxMergeCount is hit. That results in the MergeThreads
taking merges from the IndexWriter until they are exhausted, and only then that blocked merge
will run. I think it's unnecessary that that merge will be blocked.
> * CMS sorts merges by segments size, doc-based and not bytes-based. Since the default
MP is LogByteSizeMP, and I hardly believe people care about doc-based size segments anymore,
I think we should switch the default impl. There are two ways to make it extensible, if we
want:
> ** Have an overridable member/method in CMS that you can extend and override - easy.
> ** Have OneMerge be comparable and let the MP determine the order (e.g. by bytes, docs,
calibrate deletes etc.). Better, but will need to tap into several places in the code, so
more risky and complicated.
> On the go, I'd like to add some documentation to CMS - it's not very easy to read and
follow.
> I'll work on a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message