lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re:[PATCH]_IndexWriter_:_controling_the_number_of_Docs_merged_
Date Sun, 12 Oct 2003 17:02:18 GMT
Thanks Julien, I put your patch in Bugzilla, so we don't lose it.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=23754

Otis


--- fp235-5 <julien.nioche@lingway.com> wrote:
> Sorry, here is the patch ;-) 
> 
> 
> ---------- Debut du message initial -----------
> 
> De     : "fp235-5" <julien.nioche@lingway.com>
> A      : "lucene-dev" <lucene-dev@jakarta.apache.org>
> Copies : 
> Date   : Sat, 20 Sep 2003 16:06:06 +0200
> Sujet  : [PATCH] IndexWriter : controling the number of Docs merged 
> 
> Hello, 
> 
> Someone made a suggestion yesterday about adding a variable to
> IndexWriter in
> order to control the number of Documents merged in RAMDirectory
> independently of
> the mergeFactor. (I'm sorry I don't remember who exactly and the mail
> arrived at
> my office).
> I'm proposing a tiny modification of IndexWriter to add this
> functionality. A
> variable minMergeDocs specifies the number of Documents to be merged
> in memory
> before starting a new Segment. The mergeFactor still control the
> number of
> Segments created in the Directory and thus it's possible to avoid the
> file
> number limitation problem.
> 
> The diff file is attached.
> 
> As noticed by Dmitry and Erik there are no true JUnit tests. I'd be
> OK to write
> a JUnit test for this feature. The problem is that the SegmentInfos
> field is
> private in IndexWriter and can't be used to check the number and size
> of the
> Segments. I ran a test using the infoStream variable of IndexWriter -
> everything
> seems to be OK.
> 
> Any comments / suggestions are welcome. 
> 
> Regards
> 
> Julien
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
> 
> 
> > Index: IndexWriter.java
> ===================================================================
> RCS file:
>
/home/cvspublic/jakarta-lucene/src/java/org/apache/lucene/index/IndexWriter.java,v
> retrieving revision 1.15
> diff -u -r1.15 IndexWriter.java
> --- IndexWriter.java	15 Sep 2003 12:40:23 -0000	1.15
> +++ IndexWriter.java	20 Sep 2003 12:22:13 -0000
> @@ -249,6 +249,16 @@
>     *
>     * <p>This must never be less than 2.  The default value is 10.*/
>    public int mergeFactor = 10;
> +  
> +  /** Determines the minimal number of documents required before
> merging
> +   * and starting a new Segment. Since Documents are merged in a 
> +   * {@link org.apache.lucene.store.RAMDirectory}, large value gives
> faster 
> +   * indexing. At the same time mergeFactor limits the number of
> files open in 
> +   * a FSDirectory.
> +   * 
> +   * <p> The default value is 10.*/
> +  public int minMergeDocs = 10;
> +  
>  
>    /** Determines the largest number of documents ever merged by
> addDocument().
>     * Small values (e.g., less than 10,000) are best for interactive
> indexing,
> @@ -316,7 +326,7 @@
>  
>    /** Incremental segment merger.  */
>    private final void maybeMergeSegments() throws IOException {
> -    long targetMergeDocs = mergeFactor;
> +    long targetMergeDocs = minMergeDocs;
>      while (targetMergeDocs <= maxMergeDocs) {
>        // find segments smaller than current target size
>        int minSegment = segmentInfos.size();
> >
---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


__________________________________
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search
http://shopping.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message