Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 7465 invoked from network); 19 Mar 2011 14:30:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 19 Mar 2011 14:30:55 -0000 Received: (qmail 57491 invoked by uid 500); 19 Mar 2011 14:30:54 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 57423 invoked by uid 500); 19 Mar 2011 14:30:54 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 57416 invoked by uid 99); 19 Mar 2011 14:30:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Mar 2011 14:30:54 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 19 Mar 2011 14:30:51 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 8CF663B19E7 for ; Sat, 19 Mar 2011 14:30:29 +0000 (UTC) Date: Sat, 19 Mar 2011 14:30:29 +0000 (UTC) From: "Michael McCandless (JIRA)" To: dev@lucene.apache.org Message-ID: <1967602936.13661.1300545029574.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008776#comment-13008776 ] Michael McCandless commented on LUCENE-2573: -------------------------------------------- * I think once we sync up to trunk again, the FP should hold the IW's config instance, and pull settings "live" from it? Ie this way we keep our live changes to flush-by-RAM. Also, Healthiness (it won't get updates to RAM buffer now). * Should we rename *ByRAMFP --> *ByRAMOrDocCountFP? Since it "ors" docCount and RAM usage trigger right? Oh, I see, not quite -- it requires RAM buffer be set. I think we should relax that? Ie a single flush policy (the default) flushes by either/or? * Shouldn't these flush policies also trigger by maxBufferedDelCount? * Maybe FP.init should throw IllegalStateExc not IllegalArgExc? (Because, no arg is allowed once the "state" of FP has already been init'ed). * Probably FP.writer should be a SetOnce? * Hmm we still have a FlushPolicy.message? Can't we just make IW protected and then FlushPolicy impl can call IW.message? (And also remove FP.setInfoStream). * Is IW.FlushControl not really used anymore? We should remove it? * I still think LW should be 1.0 of your RAM buffer. Ie, IW will start flushing once that much RAM is in use. * I still see "synchronized (docWriter.flushControl) {" in IndexWriter * We should jdoc that IWC.setFlushPolicy takes effect only on init of IW? * Add "for testing only" comment to IW.getDocsWriter? * I wonder whether we should convey "what changed" to the FP? EG, we can 1) buffer a new del term, 2) add a new doc, 3) both (updateDocument). It could be we have onUpdate, onAdd, onDelete? Or maybe we keep single method but rename to onChange? Ie, it's called because *something* about the incoming DWPT has changed. * The flush policy shouldn't have to compute "delta" RAM like it does now? Actually why can't it just call flushControl.activeBytes(), and we ensure the delta was already folded into that? Ie we'd call commmitPerThreadBytes before FP.visit. (Then commitPerThreadBytes wouldn't ever add to flushBytes, which is sort of spooky -- like flushBytes should get incr'd only when we pull a DWPT out for flushing). * I don't think we should ever markAllWritersPending, ie, that's not the right "reaction" when flushing is too slow (eg you're on a slow hard drive) since over time this will result in flushing lots of tiny segments unnecessarily. A better reaction is to stall the incoming threads; this way the flusher threads catch up, and once you resume, then the small DPWTs have a chance to get big before they are flushed. * Misspelled: markLargesWriterPending -> markLargestWriterPending > Tiered flushing of DWPTs by RAM with low/high water marks > --------------------------------------------------------- > > Key: LUCENE-2573 > URL: https://issues.apache.org/jira/browse/LUCENE-2573 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael Busch > Assignee: Simon Willnauer > Priority: Minor > Fix For: Realtime Branch > > Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch > > > Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs. > A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach: > - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM) > - Flush all DWPTs at a high water mark (e.g. at 110%) > - Use linear steps in between high and low watermark: E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%. > Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)? Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks? -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org