lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivaylo Zlatev" <>
Subject new version of
Date Tue, 26 Feb 2002 19:33:48 GMT

Yesterday I was inspired by the conversation on the dev. list about
indexing in memory, etc 
and I wrote a new version of (it is named Find the attached file here. The code is stable and
worth a try. The following is from the javaDocs for this file:

 * IndexWriter2 is a modification of the original IndexWriter, coming
 * with lucene. It benefits from a RAMDirectory, which IndexWriter has
 * as well. The original IndexWriter treats the segments in the
 * no different from the segments in the target directory, where the
index is
 * being built. For example, it ALWAYS merges RAMDirectory segments in
 * target directory. Here, we optimize the usage of RAMDirectory in the
 * following way:<br>
 * When a new Document is added, a new segment for it is created in
 * RAMDirectory. When the RAMDirectory collects 'maxDocsInRam' (this is
a new
 * important setting, the default is 10000) 1-document
 * segments, IndexWriter2 will merge them into one 10000-documents
segment into
 * RAMDirectory (here is a difference from IndexWriter). Then it moves
 * segment from the RAMDirectory to the target directory (usually a file
 * directory). This way, during indexing, IndexWriter2 will be writing
 * of equal size (equal to maxDocsInRam) to the target directory. In
 * words, during indexing only one file-system segment is opened and
dealt with,
 * which uses just a few file handles. No more "Too many open files"
 * exceptions.<br>
 * After indexing is finished, it is good to call optimize() to merge
 * created segments into one. The RAMDirectory is out of the picture
here and
 * is not being used. Here is where we use the mergeFactor setting:
 * A total of mergeFactor+1 segments will be merged at once into one new
 * segment. This happens in a loop, until only 1 segment is left.
 * Here you can get  to a "Too many open files" exception, if your
 * is large. If you set mergeFactor to 1, it will merge only 2 segments
at a
 * time, which will preserve the file handles, but will be a bit slower
 * a merge with  mergeFactor=10, for example.<br>
 * At the end of mergeSegments() originally there was a code, where, if
 * segment file can't be deleted (because it's currently opened in
 * it stores it's name in a file, named 'deletable', so that it can try
 * delete it later. I believe there was some bug with not closing the
 * segments properly, which was the reason for all of this. Anyway, now
 * are no problems with deleting these files on Windows and therefore
the code,
 * reading and writing to the 'deletable' file is commented out.<br>
 * @author Ivaylo Zlatev (

Two weeks ago I sent an improved PriorityQueue, fixing important memory
issues and
much more. I just wasted my time - no response at all. Hopefully this
time my code will be more useful.

Regards, Ivaylo

View raw message