lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "NearRealtimeSearch" by JasonRutherglen
Date Wed, 30 Sep 2009 23:53:12 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The "NearRealtimeSearch" page has been changed by JasonRutherglen:
http://wiki.apache.org/lucene-java/NearRealtimeSearch?action=diff&rev1=7&rev2=8

  Sample code:
  
  {{{
- IndexWriter writer;
+ IndexWriter writer; // create an IndexWriter here
+ Document doc = null; // create a document here
- writer.addDocument(doc); // update
+ writer.addDocument(doc); // update a document
  IndexReader reader = writer.getReader(); // get a reader with the new doc
+ Document addedDoc = reader.document(0);
  }}}
  
  ==== Internals ====
  
    * Index Writer pools Segment Readers
    * Field caches are searched at the segment level (LUCENE-1483).  They only need to be
loaded per segment rather than for all segments (which was the functionality pre-2.9)
-   * Index Writer.getReader (LUCENE-1516) flushes changes without calling commit or flushing
deletes to disk
+   * Index Writer.getReader (LUCENE-1516) flushes updates without calling commit or flushing
deletes to disk (i.e. doesn't call fsync)
    * Speedup in indexing because instead of waiting for the RAM buffer to be written to disk,
the RAM buffer is more quickly written to the Index Writer internal RAM Directory 
    * File Switch Directory (LUCENE-1618) is used by NRT to write potentially large docstores
and term vectors to disk rather than to the RAM Directory.  This makes more RAM available
for NRT.
    * Index Reader.clone (LUCENE-1314) is used in Index Writer to carry deletes over within
segment readers.  It is also used to freeze a version so that a merge may complete and deletes
may be safely applied and searched on concurrently.  
    * Cloning bitvectors could rapidly consume heap space if updates are frequent, so LUCENE-1526
divides the bitvector into chunks.
  
+ ==== IO Cache ====
+ 
+ Large merges potentially bump existing segments out of the IO cache.  A query that was fast
may suddenly be slow due to the latency of accessing the hard drive.  One way to address this
is to implement a JNI based Directory that implements fadvise or madvise.  The advise calls
would allow segment merger to tell the OS not to load the segments being merged into the IO
cache.  
+ 

Mime
View raw message