lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "NearRealtimeSearch" by JasonRutherglen
Date Thu, 28 May 2009 02:09:15 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by JasonRutherglen:
http://wiki.apache.org/lucene-java/NearRealtimeSearch

------------------------------------------------------------------------------
  = Near Realtime Search =
  
- Near realtime search in Lucene refers to features added to IndexWriter in Lucene 2.9/trunk
that enable updates to be efficiently searched on soon after the updates are completed.  
+ Near realtime search in Lucene refers to features added to IndexWriter for Lucene version
2.9 that enable updates to be efficiently searched on very soon after the updates are completed.
 
  
    * Minimize IO overhead
    * Transparent to the user
+   * Efficient RAM management
+   
+ One goal of the near realtime search design is to make NRT as transparent as possible for
the user.  Another is minimize the time needed after an update is made to search on the index
including the new updates.  A side benefit of the implementation is the IndexReader/IndexWriter
update index dichotomy is removed.  This sometimes required users to open an IndexReader for
norm and delete updates, close the reader, then open an IndexWriter for document updates.
 Now Lucene offers a unified API where one simply calls IndexWriter.getReader after updates
and the IndexWriter returns an IndexReader representing the index plus the new updates.
  
- One goal of the near realtime search design is to make NRT as transparent as possible for
the user.  Another is minimize the time needed after an update is made to search on the index
including the new updates.  A side benefit of the implementation is the IndexReader/IndexWriter
update index dichotomy is removed.  This sometimes required a user to open an IndexReader
for norm and delete updates, and an IndexWriter for document updates.  Now we have a unified
API and one simply calls IndexWriter.getReader after updates and the IndexWriter returns an
IndexReader representing all of the updates. 
+ IndexWriter manages the subreaders internally so there is no need to call IndexReader.reopen.
 Instead of calling reopen, one calls IndexWriter.getReader again.  One benefit of this design
is the efficiency of deletes.  Where before NRT deletes needed to be materialized to disk
before being available for searching, with NRT they are carried over in the segment readers
until an explicit IndexWriter.commit is performed.  This is useful for on disk segments where
writing potentially many .del files could become IO costly over time as they accumulated and
required deleting.  Deletes are carried over in RAM in SegmentReader.deletedDocs by IndexWriter
making use of IndexReader.clone.  When clone is used, there here is no need to call IndexReader.flush
after every update to the index.
  
- IndexWriter manages the subreaders internally so there is no need to call IndexReader.reopen.
 The benefit of this design is the efficiency of deletes.  Where before NRT, deletes needed
to be materialized to disk before being available for searching, they are now carried over
in the segmentReaders until an explicit IndexWriter.commit is made.  This is useful for on
disk segments where writing potentially many .del files would become IO costly over time.
 With the deletes carried over in RAM by making use of the IndexReader.clone method internally,
there is no need to call IndexReader.flush after every update to the index.
- 
- NRT adds an internal RAMDirectory to IndexWriter where documents are flushed to before being
merged to disk.  This technique decreases the turnaround time required for updating the index
when calling getReader to search the index including those updates.  
+ NRT adds an internal RAMDirectory (LUCENE-1313) to IndexWriter where documents are flushed
to before being merged to disk.  This technique decreases the turnaround time required for
updating the index when calling getReader to search the index including those updates.  
  
  Sample code:
  
@@ -24, +25 @@

  ==== Internals ====
  
    * IndexWriter pools segmentreaders
-   * IndexWriter.getReader flushes changes without calling commit or flushing deletes to
disk
+   * IndexWriter.getReader (LUCENE-1516) flushes changes without calling commit or flushing
deletes to disk
+   * Speedup in indexing because instead of waiting for the ram buffer to be written to disk,
the ram buffer is more quickly written to the IndexWriter internal RAMDirectory 
+   * FileSwitchDirectory (LUCENE-1618) is used by NRT to write potentially large docstores
and term vectors to disk rather than to the RAMDirectory.  This makes more RAM available for
NRT.
+   * IndexReader.clone (LUCENE-1314) is used in IndexWriter to carry deletes over within
segment readers.  It is also used to freeze a version so that a merge may complete and deletes
may be safely applied and searched on concurrently.  
  

Mime
View raw message