Return-Path: Delivered-To: apmail-lucene-java-commits-archive@www.apache.org Received: (qmail 26087 invoked from network); 28 May 2009 21:24:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 May 2009 21:24:30 -0000 Received: (qmail 21152 invoked by uid 500); 28 May 2009 21:24:42 -0000 Delivered-To: apmail-lucene-java-commits-archive@lucene.apache.org Received: (qmail 21103 invoked by uid 500); 28 May 2009 21:24:42 -0000 Mailing-List: contact java-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-commits@lucene.apache.org Received: (qmail 21094 invoked by uid 99); 28 May 2009 21:24:42 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 May 2009 21:24:42 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 May 2009 21:24:40 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 4F9D7118BD for ; Thu, 28 May 2009 21:24:20 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: java-commits@lucene.apache.org Date: Thu, 28 May 2009 21:24:19 -0000 Message-ID: <20090528212420.11983.16910@eos.apache.org> Subject: [Lucene-java Wiki] Trivial Update of "NearRealtimeSearch" by OtisGospodnetic X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification. The following page has been changed by OtisGospodnetic: http://wiki.apache.org/lucene-java/NearRealtimeSearch ------------------------------------------------------------------------------ One goal of the near realtime search design is to make NRT as transparent as possible for the user. Another is minimize the time needed after an update is made to search on the index including the new updates. A side benefit of the implementation is the IndexReader/IndexWriter update index dichotomy is removed. This sometimes required users to open an IndexReader for norm and delete updates, close the reader, then open an IndexWriter for document updates. Now Lucene offers a unified API where one simply calls IndexWriter.getReader after updates and the IndexWriter returns an IndexReader representing the index plus the new updates. - IndexWriter manages the subreaders internally so there is no need to call IndexReader.reopen. Instead of calling reopen, one calls IndexWriter.getReader again. One benefit of this design is the efficiency of deletes. Where before NRT deletes needed to be materialized to disk before being available for searching, with NRT they are carried over in the segment readers until an explicit IndexWriter.commit is performed. This is useful for on disk segments where writing potentially many .del files could become IO costly over time as they accumulated and required deleting. Deletes are carried over in RAM in SegmentReader.deletedDocs by IndexWriter making use of IndexReader.clone. When clone is used, there here is no need to call IndexReader.flush after every update to the index. + IndexWriter manages the subreaders internally so there is no need to call IndexReader.reopen. Instead of calling reopen, one calls IndexWriter.getReader again. One benefit of this design is the efficiency of deletes. Where before NRT deletes needed to be materialized to disk before being available for searching, with NRT they are carried over in the segment readers until an explicit IndexWriter.commit is performed. This is useful for on disk segments where writing potentially many .del files could become IO costly over time as they accumulated and required deleting. Deletes are carried over in RAM in SegmentReader.deletedDocs by IndexWriter making use of IndexReader.clone. When clone is used, there is no need to call IndexReader.flush after every update to the index. NRT adds an internal RAMDirectory (LUCENE-1313) to IndexWriter where documents are flushed to before being merged to disk. This technique decreases the turnaround time required for updating the index when calling getReader to search the index including those updates. @@ -24, +24 @@ ==== Internals ==== - * IndexWriter pools segmentreaders + * IndexWriter pools SegmentReaders * IndexWriter.getReader (LUCENE-1516) flushes changes without calling commit or flushing deletes to disk - * Speedup in indexing because instead of waiting for the ram buffer to be written to disk, the ram buffer is more quickly written to the IndexWriter internal RAMDirectory + * Speedup in indexing because instead of waiting for the RAM buffer to be written to disk, the RAM buffer is more quickly written to the IndexWriter internal RAMDirectory * FileSwitchDirectory (LUCENE-1618) is used by NRT to write potentially large docstores and term vectors to disk rather than to the RAMDirectory. This makes more RAM available for NRT. * IndexReader.clone (LUCENE-1314) is used in IndexWriter to carry deletes over within segment readers. It is also used to freeze a version so that a merge may complete and deletes may be safely applied and searched on concurrently.