Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 27983 invoked from network); 2 Dec 2010 09:13:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Dec 2010 09:13:37 -0000 Received: (qmail 34581 invoked by uid 500); 2 Dec 2010 09:13:36 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 34416 invoked by uid 500); 2 Dec 2010 09:13:36 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 34409 invoked by uid 99); 2 Dec 2010 09:13:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Dec 2010 09:13:35 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Dec 2010 09:13:33 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id oB29DALc029880 for ; Thu, 2 Dec 2010 09:13:11 GMT Message-ID: <11470897.64271291281190825.JavaMail.jira@thor> Date: Thu, 2 Dec 2010 04:13:10 -0500 (EST) From: "Shai Erera (JIRA)" To: dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-2755) Some improvements to CMS In-Reply-To: <14053481.25971289483174131.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12966029#action_12966029 ] Shai Erera commented on LUCENE-2755: ------------------------------------ Getting rid of the IndexWriter member in CMS is not trivial w/o API change. The IndexWriter member is used for verbosing purposes, which is accessed by some public/protected API, like sync() and doMerge(). So on 3x it'd mean to deprecate methods, which IMO does not justify it. On trunk it's easier. On the other hand, we can consider two things: # Make IndexWriter ThreadLocal -- the thread who calls merge() will own its ThreadLocal, and if different threads index to different indexes, then it should work. But it won't help in case the indexing threads are taken from a pool. # Think whether we want CMS to be shareable across several IndexWriters at all. I haven't heard that requirement coming up on the list, and definitely if someone attempted to do it, things would break, so I guess no one really does it. Therefore maybe we should leave it to the users to develop something like that on their own, and maybe even contribute back. A MS which might even be simplified by not implementing all of CMS functionality today (controlling threads' priority, pause merge threads etc.). bq, The proper route is to take a handful of dirt and sticks and slap together some working code to illustrate my point. And that's what I'm gonna do. It'd be great if you will do that ! Sometimes it's indeed easier to fight over a concrete example then theoretical "can and can't work" arguments. bq. MP will recieve SegmentInfos and return OneMerge. >From whom will it receive it? In the case of cascading merges, the merge threads need to continuously pull MP for getNextMerge(MergeType), but they don't have the global picture IW holds about the existing segments (SegmentInfos). Also, IW keeps track of the segments that existed when you first called optimize() and doesn't allow the cascading merges to include segments that didn't exist at the time. Who will do that accounting now? > Some improvements to CMS > ------------------------ > > Key: LUCENE-2755 > URL: https://issues.apache.org/jira/browse/LUCENE-2755 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Shai Erera > Assignee: Shai Erera > Priority: Minor > Fix For: 3.1, 4.0 > > > While running optimize on a large index, I've noticed several things that got me to read CMS code more carefully, and find these issues: > * CMS may hold onto a merge if maxMergeCount is hit. That results in the MergeThreads taking merges from the IndexWriter until they are exhausted, and only then that blocked merge will run. I think it's unnecessary that that merge will be blocked. > * CMS sorts merges by segments size, doc-based and not bytes-based. Since the default MP is LogByteSizeMP, and I hardly believe people care about doc-based size segments anymore, I think we should switch the default impl. There are two ways to make it extensible, if we want: > ** Have an overridable member/method in CMS that you can extend and override - easy. > ** Have OneMerge be comparable and let the MP determine the order (e.g. by bytes, docs, calibrate deletes etc.). Better, but will need to tap into several places in the code, so more risky and complicated. > On the go, I'd like to add some documentation to CMS - it's not very easy to read and follow. > I'll work on a patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org