lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: new version of IndexWriter.java
Date Tue, 26 Feb 2002 19:53:27 GMT
Ivaylo,

Thanks for the contribution.  It sounds good, although I haven't looked
at it yet.  Do you have any performance numbers?  I'm curious how it
compares to the original IndexWriter.

As for your PriorityQueue, it's still sitting flagged in my Lucene
folder for review.
I've been meaning to send a reply with the following question, not just
for you, but for Doug and others as well:
Is there anything special, anything Lucene-specific in that
PriorityQueue?  If not, there is a PriorityQueue implementation in
Jakarta's Commons Collections sub-project which we could (re)use
instead of having our own.  On the other hand, this requires that we
include the collections jar in lib.

Just some thoughts.
In any case, sorry for not replying, the contribution _is_ appreciated.

Otis


--- Ivaylo Zlatev <IZlatev@entigen.com> wrote:
> 
> Yesterday I was inspired by the conversation on the dev. list about
> indexing in memory, etc 
> and I wrote a new version of IndexWriter.java (it is named
> IndexWriter2.java). Find the attached file here. The code is stable
> and
> worth a try. The following is from the javaDocs for this file:
> 
> /**
>  * IndexWriter2 is a modification of the original IndexWriter, coming
>  * with lucene. It benefits from a RAMDirectory, which IndexWriter
> has
>  * as well. The original IndexWriter treats the segments in the
> RAMDirectory
>  * no different from the segments in the target directory, where the
> index is
>  * being built. For example, it ALWAYS merges RAMDirectory segments
> in
> the
>  * target directory. Here, we optimize the usage of RAMDirectory in
> the
>  * following way:<br>
>  *
>  * When a new Document is added, a new segment for it is created in
>  * RAMDirectory. When the RAMDirectory collects 'maxDocsInRam' (this
> is
> a new
>  * important setting, the default is 10000) 1-document
>  * segments, IndexWriter2 will merge them into one 10000-documents
> segment into
>  * RAMDirectory (here is a difference from IndexWriter). Then it
> moves
> this
>  * segment from the RAMDirectory to the target directory (usually a
> file
> system
>  * directory). This way, during indexing, IndexWriter2 will be
> writing
> segments
>  * of equal size (equal to maxDocsInRam) to the target directory. In
> other
>  * words, during indexing only one file-system segment is opened and
> dealt with,
>  * which uses just a few file handles. No more "Too many open files"
>  * exceptions.<br>
>  *
>  * After indexing is finished, it is good to call optimize() to merge
> all
>  * created segments into one. The RAMDirectory is out of the picture
> here and
>  * is not being used. Here is where we use the mergeFactor setting:
>  * A total of mergeFactor+1 segments will be merged at once into one
> new
>  * segment. This happens in a loop, until only 1 segment is left.
>  * Here you can get  to a "Too many open files" exception, if your
> mergeFactor
>  * is large. If you set mergeFactor to 1, it will merge only 2
> segments
> at a
>  * time, which will preserve the file handles, but will be a bit
> slower
> than
>  * a merge with  mergeFactor=10, for example.<br>
>  *
>  * At the end of mergeSegments() originally there was a code, where,
> if
> a
>  * segment file can't be deleted (because it's currently opened in
> Windows),
>  * it stores it's name in a file, named 'deletable', so that it can
> try
> to
>  * delete it later. I believe there was some bug with not closing the
> merged
>  * segments properly, which was the reason for all of this. Anyway,
> now
> there
>  * are no problems with deleting these files on Windows and therefore
> the code,
>  * reading and writing to the 'deletable' file is commented out.<br>
>  *
>  * @author Ivaylo Zlatev (ivaylo_zlatev@yahoo.com)
>  */
> 
> 
> Two weeks ago I sent an improved PriorityQueue, fixing important
> memory
> issues and
> much more. I just wasted my time - no response at all. Hopefully this
> time my code will be more useful.
> 
> Regards, Ivaylo
>  <<IndexWriter2.java>> 
> 

> ATTACHMENT part 2 application/octet-stream name=IndexWriter2.java
> --
> To unsubscribe, e-mail:  
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>


__________________________________________________
Do You Yahoo!?
Yahoo! Greetings - Send FREE e-cards for every occasion!
http://greetings.yahoo.com

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message