lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivaylo Zlatev" <IZla...@entigen.com>
Subject new version of IndexWriter.java
Date Tue, 26 Feb 2002 19:33:48 GMT

Yesterday I was inspired by the conversation on the dev. list about
indexing in memory, etc 
and I wrote a new version of IndexWriter.java (it is named
IndexWriter2.java). Find the attached file here. The code is stable and
worth a try. The following is from the javaDocs for this file:

/**
 * IndexWriter2 is a modification of the original IndexWriter, coming
 * with lucene. It benefits from a RAMDirectory, which IndexWriter has
 * as well. The original IndexWriter treats the segments in the
RAMDirectory
 * no different from the segments in the target directory, where the
index is
 * being built. For example, it ALWAYS merges RAMDirectory segments in
the
 * target directory. Here, we optimize the usage of RAMDirectory in the
 * following way:<br>
 *
 * When a new Document is added, a new segment for it is created in
 * RAMDirectory. When the RAMDirectory collects 'maxDocsInRam' (this is
a new
 * important setting, the default is 10000) 1-document
 * segments, IndexWriter2 will merge them into one 10000-documents
segment into
 * RAMDirectory (here is a difference from IndexWriter). Then it moves
this
 * segment from the RAMDirectory to the target directory (usually a file
system
 * directory). This way, during indexing, IndexWriter2 will be writing
segments
 * of equal size (equal to maxDocsInRam) to the target directory. In
other
 * words, during indexing only one file-system segment is opened and
dealt with,
 * which uses just a few file handles. No more "Too many open files"
 * exceptions.<br>
 *
 * After indexing is finished, it is good to call optimize() to merge
all
 * created segments into one. The RAMDirectory is out of the picture
here and
 * is not being used. Here is where we use the mergeFactor setting:
 * A total of mergeFactor+1 segments will be merged at once into one new
 * segment. This happens in a loop, until only 1 segment is left.
 * Here you can get  to a "Too many open files" exception, if your
mergeFactor
 * is large. If you set mergeFactor to 1, it will merge only 2 segments
at a
 * time, which will preserve the file handles, but will be a bit slower
than
 * a merge with  mergeFactor=10, for example.<br>
 *
 * At the end of mergeSegments() originally there was a code, where, if
a
 * segment file can't be deleted (because it's currently opened in
Windows),
 * it stores it's name in a file, named 'deletable', so that it can try
to
 * delete it later. I believe there was some bug with not closing the
merged
 * segments properly, which was the reason for all of this. Anyway, now
there
 * are no problems with deleting these files on Windows and therefore
the code,
 * reading and writing to the 'deletable' file is commented out.<br>
 *
 * @author Ivaylo Zlatev (ivaylo_zlatev@yahoo.com)
 */


Two weeks ago I sent an improved PriorityQueue, fixing important memory
issues and
much more. I just wasted my time - no response at all. Hopefully this
time my code will be more useful.

Regards, Ivaylo
 <<IndexWriter2.java>> 

Mime
View raw message