lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "PainlessIndexing" by MikeMcCandless
Date Sun, 13 Jan 2008 17:24:37 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The following page has been changed by MikeMcCandless:
http://wiki.apache.org/lucene-java/PainlessIndexing

------------------------------------------------------------------------------
+ See ImproveIndexingSpeed.
- IndexWriter has a useful method called (at least temporarily) '''setMinMergeDocs'''
- that should be used in order to avoid file handles problems and reduce
- indexing time.
  
- File handles problem is often due to the fact that people use large '''mergeFactor''' 
- values in order to speed up indexation.  The maximum number of open files while merging
is around mergeFactor * (5 + number of indexed fields), 
- which can be too much for the FSDirectory.
- 
- By setting a higher value to '''minMergeDocs''', you'll index and merge with a
- RAMDirectory which is internally used by the IndexWriter. When the limit set by '''minMergeDocs'''
is reached (ex 1000) a segment is written in
- the FS. '''mergeFactor''' controls the number of segments to be merged, so when
- you have 10 segments on the FS (which is already 10x1000 docs), the
- IndexWriter will merge them all into a single segment. This is equivalent to
- an optimize I think. The process continues like that until it's finished.
- 
- Combining these parameters should be enough to achieve good performance.
- The good point of using '''minMergeDocs''' is that you make a heavy use of the
- RAMDirectory used by your IndexWriter (== fast) without having to be too
- careful with the RAM (which would be the case with RAMDirectory). At the
- same time keeping your mergeFactor low, limits the risk of too many file handles
- problems.
- 
- <hint given by JulienNioche>
- 

Mime
View raw message