Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 88051 invoked from network); 18 Aug 2007 18:30:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Aug 2007 18:30:55 -0000 Received: (qmail 52928 invoked by uid 500); 18 Aug 2007 18:30:49 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 52876 invoked by uid 500); 18 Aug 2007 18:30:49 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 52865 invoked by uid 99); 18 Aug 2007 18:30:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Aug 2007 11:30:49 -0700 X-ASF-Spam-Status: No, hits=-98.8 required=10.0 tests=ALL_TRUSTED,DNS_FROM_DOB,RCVD_IN_DOB X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 18 Aug 2007 18:30:51 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 28B467141FE for ; Sat, 18 Aug 2007 11:30:31 -0700 (PDT) Message-ID: <26817035.1187461831164.JavaMail.jira@brutus> Date: Sat, 18 Aug 2007 11:30:31 -0700 (PDT) From: "Steven Parkes (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter In-Reply-To: <24621308.1174677392992.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12520881 ] Steven Parkes commented on LUCENE-847: -------------------------------------- my feeling is we should not deprecate setUseCompoundFile, setMergeFactor, setMaxMergeDocs I understood that you didn't want to deprecate them in IndexWriter. I wasn't sure that you meant that they should be added to the MergePolicy interface? If you do, everything makes sense. Otherwise, it sounds like there's still a cast in there and I'm not sure about that. I think IndexWriter should enforce it? Ie no merge policy should be allowed to leave segments in other dirs (= at inconsistent index) at point of commit. I think it's just about code location: since a merge policy might want to factor into it's algorithm the directories used, it needs the info and it will presumably sometimes do it. Presumably you could provide code in MergePolicyBase so the merges could decide when but wouldn't have to write the copy loop. If you put the code in IndexWriter too, it sounds duplicated, again presuming sometimes a policy might want to do it itself. I like that idea :) It fits well w/ the stateless API. Ie, merge policy returns all possible merges and "someone above" takes care of scheduling them. So it returns a vector of specs? That's essentially what the CMP as an above/below wrapper does. I can see that above/below is strange enough to be less clever (I wasn't trying to be so much clever as backwards compatible) and more insane. Sane is good. Hmm. This means each merge policy must know whether it's talking to CMP or IndexWriter underneith? With the stateless approach this wouldn't happen. Well, I wouldn't so much say it has to know. All it cares is what merge returns. Doesn't have to know who returned it or why. The only real difference between this and the "generate a vector of merges" is that in the merge policy can take advantage immediately of merge results in the serial case where if you're generating a vector of merges, it can't know. Of course, I guess in that case, if IndexWriter gets a vector of merges, it can always take the lowest and ignore the rest, calling the merge policy again incase it wants to request a different set. Then you only have the excess computation for merges you never really considered. Oh I see... that's kind of sneaky (planning on using exceptions to abort a merge requested by the policy). There's always going to be the chance of an exception to a merge. I'm pretty sure of that. But you're right, if the merge policy isn't in the control path, it would never see them. They'll be there, but it's out of the path. But since you're already doing the work to allow a merge to run in the BG without blocking adding of docs, flushing, etc, wouldn't this come nearly for free? I haven't looked at this. Well, eg flush() now synchronizes on IndexWriter Yeah, and making it not is less than straightforward. I've looked at his code a fair amount, experimented with different ideas, but hadn't gotten all the way to a working model. You can look at locking segmentInfos but there are many places that segmentInfos is iterated over that would require locks if the lock on IW wasn't sufficient to guarantee that the iteration was safe. I did look at that early on, so maybe my understanding was still too lacking and it's more feasible than I was thinking ... > Factor merge policy out of IndexWriter > -------------------------------------- > > Key: LUCENE-847 > URL: https://issues.apache.org/jira/browse/LUCENE-847 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Steven Parkes > Assignee: Steven Parkes > Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, LUCENE-847.patch.txt, LUCENE-847.txt > > > If we factor the merge policy out of IndexWriter, we can make it pluggable, making it possible for apps to choose a custom merge policy and for easier experimenting with merge policy variants. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org