lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2755) Some improvements to CMS
Date Fri, 12 Nov 2010 17:54:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931468#action_12931468
] 

Shai Erera commented on LUCENE-2755:
------------------------------------

Ok, so not calling IndexWriter.getNextMerge() before we know we can register that merge is
problematic. The reason is we want to know if there is a next merge before we check if it
can be registered. If not, the method returns immediately. Otherwise, we'll wait until any
merge can be registered, just to discover there are no more merge.

So one solution can be to add to IW a hasMerges() and in CMS wait for room to become available
only if there are merges.

Another solution is to do a larger change to CMS and introduce an ExecutorService - this has
been raised in the past, so perhaps it's time to finally do it? By using a blocking queue,
we don't need to implement any waiting logic - Java will do it for us.

The downside of that is that I'm not sure we can control which of the merges runs and which
isn't. Perhaps we can hack this through - I'll need to start the process to tell for sure.
This feature is important - today CMS guarantees the smaller merges run first - so it might
be that a larger merge was registered before a smaller merge, and we'd still want to execute
the smaller one before the larger.

A third solution would be to not do anything and keep things as they are - namely let some
merge be held by CMS until it can be executed.

Just summarizing my thoughts for now.

> Some improvements to CMS
> ------------------------
>
>                 Key: LUCENE-2755
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2755
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> While running optimize on a large index, I've noticed several things that got me to read
CMS code more carefully, and find these issues:
> * CMS may hold onto a merge if maxMergeCount is hit. That results in the MergeThreads
taking merges from the IndexWriter until they are exhausted, and only then that blocked merge
will run. I think it's unnecessary that that merge will be blocked.
> * CMS sorts merges by segments size, doc-based and not bytes-based. Since the default
MP is LogByteSizeMP, and I hardly believe people care about doc-based size segments anymore,
I think we should switch the default impl. There are two ways to make it extensible, if we
want:
> ** Have an overridable member/method in CMS that you can extend and override - easy.
> ** Have OneMerge be comparable and let the MP determine the order (e.g. by bytes, docs,
calibrate deletes etc.). Better, but will need to tap into several places in the code, so
more risky and complicated.
> On the go, I'd like to add some documentation to CMS - it's not very easy to read and
follow.
> I'll work on a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message