Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 79291 invoked from network); 22 May 2009 13:48:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 May 2009 13:48:59 -0000 Received: (qmail 17454 invoked by uid 500); 22 May 2009 13:49:11 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 17392 invoked by uid 500); 22 May 2009 13:49:11 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 17384 invoked by uid 99); 22 May 2009 13:49:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 May 2009 13:49:11 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 May 2009 13:49:07 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 08F5829A0011 for ; Fri, 22 May 2009 06:48:46 -0700 (PDT) Message-ID: <1219559515.1243000126022.JavaMail.jira@brutus> Date: Fri, 22 May 2009 06:48:46 -0700 (PDT) From: "Michael McCandless (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1313) Realtime Search In-Reply-To: <1125794672.1214154225042.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712082#action_12712082 ] Michael McCandless commented on LUCENE-1313: -------------------------------------------- I think generally we are close. I have lots of little comments from looking through the patch: * Can you update the CHANGES entry to something like "IndexWriter now uses RAM more efficiently when in near real-time mode"? (Ie we don't pass RAMDir to IW). * DW.push/getRAMDirSize, RAMTotalMax, RAMBufferAvailable, etc. need to be synchronized? * Since IW.flushDocStores always goes to the main directory, why does it now take a Directory arg? * I don't think doAfterFlush should be responsible for calling pushRamDirSize(); that's more of a hook for external subclasses. * Yes, IW.ramSizeInBytes() should include the ramDir's bytes * There are still places where Directory.contains should be used, instead of pulling both dirs and checkign each. EG, the assert in DW.applyDeletes, and this assert in IW: {code} if (ramNrt && merge.directory == switchDirectory) { assert !merge.useCompoundFile; } {code} I'd like to eliminate IW.getInternalDirectory, if possible: to anyone interacting with IW, there is only one Directory, and the switching is entirely "under the hood". * I realized there is in fact a benefit to using CFS in RAM: much better RAM efficiency for tiny segments (because RAMDir's buffer size is 1 KB). Though such segments would presumably be merged away with time, so it may not be a big deal... * Is IW.mergeRAMSegmentToDir only for testing? * Can you name things theRAMSetting instead of theRamSetting? (Ie, RAM is all caps). * For IW.resolveRAMSegments, maybe we should make a single merge that merges everything down? Why even bother interacting with a merge policy, here? * Can you rename flush()'s new arg "flushToRAM" to "allowFlushToRAM"? Ie, even when this is true, that method may decide RAM is full and in fact flush to the real dir. * Can you rename IW.ramNRT to IW.flushToRAM? (Since it's in fact orthogonal to NRT). * It's sneaky to set docWriter.flushToDir before calling docWriter.flush; can't we make that an arg to docWriter.flush? (And docWriter would never store it). * Why did you need to add DW.fileLength? * IW.SWITCH_FILE_EXTS should be private static final (not public)? * We lost private on a number of attrs in IW -- can you restore? (You should insert nocommit comments when you do that, to reduce risk that such changes slip in). * Likewise for SegmentReader.coreRef. * Why did you need to make RAMDir.sizeInBytes volatile? Isn't it always updated/accessed from sync(RAMDir) context? * Why do we need a new class RAMMergePolicy? (There's no API difference over MergePolicy). Can't we simply by default instantiate LogByteSizeMergePolicy, and set CFS/CFX to false? * IW.fileSwitchDirectory should be private? * Have you done any perf tests with flushToRAM = true? EG should we enable it by default? I think if we have a good policy for managing RAM it could very well be higher performance. But, we should explore this under a different issue, so leave the default at "no ram dir". On the "how to share RAM" between RAMDir & DW's RAM buffer... instead of pre-dividing and growing over time, I think we can simplify it by logically sharing a single "pool". The RAMDir only alters its ram usage when 1) we flush a new segment to it, 2) a merge completes (either writing to the real dir or to the ram dir), or 3) deletes are applied to segments in RAM. When such a change happens we notify DW. DW takes then adds that base into its ram consumption to decide when it's time to flush. For starters, and we can optimize this later, I don't think DW should choose on its own to flush itself to the RAMDir? That should only happen when getReader is called, and there's still plenty of RAM free. So what happens is... each time getReader() is called, we make a new smallish RAM segment. Over time, these RAM segments need merging so we merge them. (If such a merge is fairly large, probably instead of writing to ram it should write the new segment to the real dir, since intermediate RAM usage will be too high). At some point, DW detects that the RAMDir size plus its own buffer is at the limit. If DW's buffer is relatively small, it should probably simply flush to the RAMDir then dump entire RAMDir to the real dir as a single merge. If DW's buffer is big, as would happen if you opened an NRT reader but never actually called getReader(), it should flush straight to the real dir. One challenge we face is ensuring that while we are flushing all ram segments to disk, we don't block the getReader() turnaround. IE we can't make getReader() do that flush synchronously. So that needs to be a BG merge, but we must somehow temporarily disregard the size of those segments while the merge is running. Or, perhaps we "merge RAM segments to disk" a bit early, eg once RAM consumed is > 90% of the total RAM buffer, or something. > Realtime Search > --------------- > > Key: LUCENE-1313 > URL: https://issues.apache.org/jira/browse/LUCENE-1313 > Project: Lucene - Java > Issue Type: New Feature > Components: Index > Affects Versions: 2.4.1 > Reporter: Jason Rutherglen > Priority: Minor > Fix For: 2.9 > > Attachments: LUCENE-1313.jar, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, LUCENE-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch, lucene-1313.patch > > > Enable near realtime search in Lucene without external > dependencies. When RAM NRT is enabled, the implementation adds a > RAMDirectory to IndexWriter. Flushes go to the ramdir unless > there is no available space. Merges are completed in the ram > dir until there is no more available ram. > IW.optimize and IW.commit flush the ramdir to the primary > directory, all other operations try to keep segments in ram > until there is no more space. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org