lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2755) Some improvements to CMS
Date Thu, 02 Dec 2010 09:13:10 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966029#action_12966029
] 

Shai Erera commented on LUCENE-2755:
------------------------------------

Getting rid of the IndexWriter member in CMS is not trivial w/o API change. The IndexWriter
member is used for verbosing purposes, which is accessed by some public/protected API, like
sync() and doMerge(). So on 3x it'd mean to deprecate methods, which IMO does not justify
it. On trunk it's easier.

On the other hand, we can consider two things:
# Make IndexWriter ThreadLocal -- the thread who calls merge() will own its ThreadLocal, and
if different threads index to different indexes, then it should work. But it won't help in
case the indexing threads are taken from a pool.
# Think whether we want CMS to be shareable across several IndexWriters at all. I haven't
heard that requirement coming up on the list, and definitely if someone attempted to do it,
things would break, so I guess no one really does it. Therefore maybe we should leave it to
the users to develop something like that on their own, and maybe even contribute back. A MS
which might even be simplified by not implementing all of CMS functionality today (controlling
threads' priority, pause merge threads etc.).

bq, The proper route is to take a handful of dirt and sticks and slap together some working
code to illustrate my point. And that's what I'm gonna do.

It'd be great if you will do that ! Sometimes it's indeed easier to fight over a concrete
example then theoretical "can and can't work" arguments.

bq. MP will recieve SegmentInfos and return OneMerge.

>From whom will it receive it? In the case of cascading merges, the merge threads need
to continuously pull MP for getNextMerge(MergeType), but they don't have the global picture
IW holds about the existing segments (SegmentInfos). Also, IW keeps track of the segments
that existed when you first called optimize() and doesn't allow the cascading merges to include
segments that didn't exist at the time. Who will do that accounting now?

> Some improvements to CMS
> ------------------------
>
>                 Key: LUCENE-2755
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2755
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>
> While running optimize on a large index, I've noticed several things that got me to read
CMS code more carefully, and find these issues:
> * CMS may hold onto a merge if maxMergeCount is hit. That results in the MergeThreads
taking merges from the IndexWriter until they are exhausted, and only then that blocked merge
will run. I think it's unnecessary that that merge will be blocked.
> * CMS sorts merges by segments size, doc-based and not bytes-based. Since the default
MP is LogByteSizeMP, and I hardly believe people care about doc-based size segments anymore,
I think we should switch the default impl. There are two ways to make it extensible, if we
want:
> ** Have an overridable member/method in CMS that you can extend and override - easy.
> ** Have OneMerge be comparable and let the MP determine the order (e.g. by bytes, docs,
calibrate deletes etc.). Better, but will need to tap into several places in the code, so
more risky and complicated.
> On the go, I'd like to add some documentation to CMS - it's not very easy to read and
follow.
> I'll work on a patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message