lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "NearRealtimeSearch" by JasonRutherglen
Date Thu, 28 May 2009 00:03:12 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by JasonRutherglen:
http://wiki.apache.org/lucene-java/NearRealtimeSearch

------------------------------------------------------------------------------
  
  Near realtime search in Lucene refers to features added to IndexWriter in Lucene 2.9/trunk
that enable updates to be efficiently searched on soon after the updates are completed.  
  
- * Minimize IO overhead
+   * Minimize IO overhead
- * Transparent to the user
+   * Transparent to the user
  
+ One goal of the near realtime search design is to make NRT as transparent as possible for
the user.  Another is minimize the time needed after an update is made to search on the index
including the new updates.  A side benefit of the implementation is the IndexReader/IndexWriter
update index dichotomy is removed.  This sometimes required a user to open an IndexReader
for norm and delete updates, and an IndexWriter for document updates.  Now we have a unified
API and one simply calls IndexWriter.getReader after updates and the IndexWriter returns an
IndexReader representing all of the updates. 
- One goal was to make the system as transparent as possible for users, so the API is fairly
simple.  
- 
- The goal of the near realtime search design is to make NRT as transparent as possible for
the user and minimize IO.  The side benefit was we removed the IndexReader/IndexWriter update
index dichotomy where in some cases one would need to open an IndexReader for norm and delete
updates, and an IndexWriter for document updates.  Now one simply calls IndexWriter.getReader
after updates and the IndexWriter returns an IndexReader representing all of the previous
updates. 
  
  IndexWriter manages the subreaders internally so there is no need to call IndexReader.reopen.
 The benefit of this design is the efficiency of deletes.  Where before NRT, deletes needed
to be materialized to disk before being available for searching, they are now carried over
in the segmentReaders until an explicit IndexWriter.commit is made.  This is useful for on
disk segments where writing potentially many .del files would become IO costly over time.
 With the deletes carried over in RAM by making use of the IndexReader.clone method internally,
there is no need to call IndexReader.flush after every update to the index.
  
@@ -23, +21 @@

  IndexReader reader = writer.getReader(); // get a reader with the new doc
  }}}
  
- == Internals == 
+ ==== Internals ====
  
- * IndexWriter pools segmentreaders
+   * IndexWriter pools segmentreaders
- * IndexWriter.getReader flushes changes without calling commit or flushing deletes to disk
+   * IndexWriter.getReader flushes changes without calling commit or flushing deletes to
disk
- * 
  

Mime
View raw message