lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2755) Some improvements to CMS
Date Wed, 17 Nov 2010 04:22:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932807#action_12932807
] 

Shai Erera commented on LUCENE-2755:
------------------------------------

bq. You contradict yourself here. If we make OneMerge comparable, we define order in its compareTo()
method. 

I think it's convenient to have OneMerge comparable somehow. But we can have MP sort them
using its own Comparator. By making them Comparable I intended to say 'this is the default
order' - but we can have a DefaultComparator instead.

If we proceed w/ your proposal, that is basically the MS/ME polling MP, and not IW doing so,
how would IW know about the running merges and pending ones? Today IW tracks those two lists
so that if you need to abort merges, it knows which ones to abort.

We can workaround aborting the running merges by introducing a MS.abort()-like method. But
what about MP? Now the lists are divided between too entities (MP and MS), and aborting a
MP does not make sense (doable, but I don't think it belongs there). Maybe we can have MS.abort()
poll MP for next merges until it returns null, and throwing all the returned ones away - that
can be done. Aborting an Executor is easy, and I think can be faster than our current way
of doing so.

I would still love to see the merge code (as much as possible) going away from IW. This may
not be doable now, but could be in the future, if we factor out a SegmentsMerger/IndexMerger
entity which encapsulates the merge execution and policy inside. But this is for another day.

BTW, MS.merge() takes an IW, as if you could call merge() w/ two IW instances and things will
work ok. It does in SMS but doesn't in CMS. Should we, in the scope of this issue, make IW
a required settable parameter on MS, like we do w/ MP?

> Some improvements to CMS
> ------------------------
>
>                 Key: LUCENE-2755
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2755
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> While running optimize on a large index, I've noticed several things that got me to read
CMS code more carefully, and find these issues:
> * CMS may hold onto a merge if maxMergeCount is hit. That results in the MergeThreads
taking merges from the IndexWriter until they are exhausted, and only then that blocked merge
will run. I think it's unnecessary that that merge will be blocked.
> * CMS sorts merges by segments size, doc-based and not bytes-based. Since the default
MP is LogByteSizeMP, and I hardly believe people care about doc-based size segments anymore,
I think we should switch the default impl. There are two ways to make it extensible, if we
want:
> ** Have an overridable member/method in CMS that you can extend and override - easy.
> ** Have OneMerge be comparable and let the MP determine the order (e.g. by bytes, docs,
calibrate deletes etc.). Better, but will need to tap into several places in the code, so
more risky and complicated.
> On the go, I'd like to add some documentation to CMS - it's not very easy to read and
follow.
> I'll work on a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message