Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 17231 invoked from network); 7 Aug 2007 16:36:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Aug 2007 16:36:28 -0000 Received: (qmail 93863 invoked by uid 500); 7 Aug 2007 16:36:24 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 93811 invoked by uid 500); 7 Aug 2007 16:36:24 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 93800 invoked by uid 99); 7 Aug 2007 16:36:24 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Aug 2007 09:36:24 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Aug 2007 16:36:17 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id C54277141FE for ; Tue, 7 Aug 2007 09:36:01 -0700 (PDT) Message-ID: <16956635.1186504561805.JavaMail.jira@brutus> Date: Tue, 7 Aug 2007 09:36:01 -0700 (PDT) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-847) Factor merge policy out of IndexWriter In-Reply-To: <24621308.1174677392992.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518184 ] Michael McCandless commented on LUCENE-847: ------------------------------------------- Some more feedback: - Is the separate IndexMerger interface really necessary? Can't we just use IndexWriter directly? It's somewhat awkward/forced now, because the interface has getters for ramBufferSizeMB and maxBufferedDocs that are really a "writer" (flushing) thing not a "merging" thing. While LogDocMergePolicy does need "maxBufferedDocs", I think, instead, in IndexWriter's "setMaxBufferedDocs()" it should "write through" to the LogDocMergePolicy if that is the merge policy in use (and LogDocMergePolicy should store its own private "minMergeDocs"). I think the three getters may not even be needed (based on comments below), in which case it seems like we shouldn't be creating a new interface. - In LogDocMergePolicy, it seems like the checking that's done for whether a SegmentInfo is in a different directory, as well as the subsequent copy to move it over to the IndexWriter's directory, should not live in the MergePolicy? Otherwise, every MergePolicy is going to have to duplicate this code. Not to mention, we may someday create a more efficient way to copy than running a single-segment merge (which is a very inefficient, but, we only do it in addIndexes* I think). I'd like to capture this in one place (IndexWriter). EG, the writer could have its own "copyExternalSegments" method which is called in addIndexes* and also optimize(), after the merge policy has done its merge, which does the check for wrong directory and subsequent copy. I think this would mean IndexMerger.getDirectory() is not really needed. - The "checkOptimize" method in LogDocMergePolicy seems like it belongs back in IndexWriter: I think we don't want every MergePolicy having to replicate that tricky while condition. Maybe we could change MergePolicy.merge to take a boolean "forced" which, if true, means that the MergePolicy must now pick a merge even if it didn't think it was time to. This would let us move that tricky while condition logic back into IndexWriter. Also, I think at some point we may want to have an optimize() method that takes an int parameter stating the max # segments in the resulting index. This would allow you to optimize down to <= N segments w/o paying full cost of a complete "only one segment" optimize. If we had a "forced" boolean then we could build such an optimize method in the future, whereas as it stands now it would not be so easy to retrofit previously created MergePolicy classes to do this. And some more minor feedback: - I love the simplification of addIndexesNoOptimize :) Though (same comment as above) I think we should move that final "copy if different directory" step back in IndexWriter. - There are some minor things that should not be committed eg the added "infoStream = null" in IndexFileDeleter. I typically try to put a comment "// nocommit" above such changes as I make them to remind myself and keep them from slipping in. - In the deprecated IndexWriter methods you're doing a hard cast to LogDocMergePolicy which gives a bad result if you're using a different merge policy; maybe catch the class cast exception (or, better, check upfront if it's an instanceof) and raise a more reasonable exception if not? - IndexWriter around line 1908 has for loop that has commented out "System.err.println"; we should just comment out/remove for loop - These commented out synchronized spook me a bit: /* synchronized(segmentInfos) */ { Are they needed in these contexts? Is this only once we have concurrent merging? (This ties back to the first feedback to simplify MergePolicy API so that this kind of locking is only needed inside IndexWriter). - Can we support non-contiguous merges? If I understand it correctly, the MergeSpecification can express such a merge, it's just that the current IndexMerger (IndexWriter) cannot execute it, right? So we are at least not precluding fixing this in the future. - It confuses me that MergePolicy has a method "merge(...)" -- can we rename it to "maybeMerge(..)" or "checkForMerge(...)"? Ie, this method determines whether a merge is necessary and, if so, it then asks IndexMerger to in fact execute the merge (or, returns the spec)? - Instead of IndexWriter.releaseMergePolicy() can we have IndexWriter only close the merge policy if it was the one that had created it? (Similar to how IndexWriter closes the dir if it has opened it from a String or File, but does not close it if it was passed in). > Factor merge policy out of IndexWriter > -------------------------------------- > > Key: LUCENE-847 > URL: https://issues.apache.org/jira/browse/LUCENE-847 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Steven Parkes > Assignee: Steven Parkes > Attachments: concurrentMerge.patch, LUCENE-847.patch.txt, LUCENE-847.txt > > > If we factor the merge policy out of IndexWriter, we can make it pluggable, making it possible for apps to choose a custom merge policy and for easier experimenting with merge policy variants. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org