lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "ImproveIndexingSpeed" by MikeMcCandless
Date Sun, 03 Feb 2008 22:47:18 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by MikeMcCandless:
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

The comment on the change is:
Just fixing inequality..

------------------------------------------------------------------------------
  
   * '''Open a single writer and re-use it for the duration of your indexing session.'''
  
-  * '''Lucene <= 2.2: Flush by RAM usage instead of document count.'''
+  * '''Flush by RAM usage instead of document count.'''
  
-  Call [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#ramSizeInBytes()
writer.ramSizeInBytes()] after every added doc then call [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#flush()
flush()] when it's using too much RAM.  This is especially good if you have small docs or
 highly variable doc sizes.  You need to first set [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#setMaxBufferedDocs(int)
maxBufferedDocs] large enough to prevent the writer from flushing based on document count.
 However, don't set it too large otherwise you may hit [http://issues.apache.org/jira/browse/LUCENE-845
LUCENE-845].  Somewhere around 2-3X your "typical" flush count should be OK.
+  '''For Lucene <= 2.2''': call [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#ramSizeInBytes()
writer.ramSizeInBytes()] after every added doc then call [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#flush()
flush()] when it's using too much RAM.  This is especially good if you have small docs or
 highly variable doc sizes.  You need to first set [http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#setMaxBufferedDocs(int)
maxBufferedDocs] large enough to prevent the writer from flushing based on document count.
 However, don't set it too large otherwise you may hit [http://issues.apache.org/jira/browse/LUCENE-845
LUCENE-845].  Somewhere around 2-3X your "typical" flush count should be OK.
+ 
+  '''For Lucene >= 2.3''': IndexWriter can flush according to RAM usage itself.  Call
[http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexWriter.html#setRAMBufferSizeMB()
writer.setRAMBufferSizeMB()] to set the buffer size.  Be sure you don't also have any leftover
calls to setMaxBufferedDocs since the writer will flush "either or" (whichever comes first).
  
   * '''Use as much RAM as you can afford.'''
  

Mime
View raw message