lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rutherglen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2755) Some improvements to CMS
Date Tue, 16 Nov 2010 18:54:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932599#action_12932599
] 

Jason Rutherglen commented on LUCENE-2755:
------------------------------------------

bq. large merge can easily kill your NRT reopens

When RT is implemented, these small segments (that require merging) go
away because we'll be flushing relatively medium sized segments the size
of the RAM buffer.

I'd prefer to have more control of large merges from an external process
so that they may be scheduled according to application demand, ie, during
non-peak hours. This is actually what I've implemented in production using
things like optimize num segments = 5 and/or expunge deletes during the
early morning hours. However the external control is doable today using an
existing merge policy such as LogByteSizeMergePolicy, where during the day
for example, the maximum segment size could be lower, and in the early
morning, it'd be set to something much higher, or nullified altogether. 

Also the problems associated with merge interleaving only affects systems
that are not using replication because the merging is occurring on an
index-only server. The newly merged files are transferred over as is, and
that's something that can easily be interleaved with other IO processes.

> Some improvements to CMS
> ------------------------
>
>                 Key: LUCENE-2755
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2755
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> While running optimize on a large index, I've noticed several things that got me to read
CMS code more carefully, and find these issues:
> * CMS may hold onto a merge if maxMergeCount is hit. That results in the MergeThreads
taking merges from the IndexWriter until they are exhausted, and only then that blocked merge
will run. I think it's unnecessary that that merge will be blocked.
> * CMS sorts merges by segments size, doc-based and not bytes-based. Since the default
MP is LogByteSizeMP, and I hardly believe people care about doc-based size segments anymore,
I think we should switch the default impl. There are two ways to make it extensible, if we
want:
> ** Have an overridable member/method in CMS that you can extend and override - easy.
> ** Have OneMerge be comparable and let the MP determine the order (e.g. by bytes, docs,
calibrate deletes etc.). Better, but will need to tap into several places in the code, so
more risky and complicated.
> On the go, I'd like to add some documentation to CMS - it's not very easy to read and
follow.
> I'll work on a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message