lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <grant.ingers...@gmail.com>
Subject Re: optimize() method call
Date Wed, 18 Apr 2007 20:29:27 GMT
Has anyone done in benchmarking to approximate how long it takes to  
optimize different size indexes?  Is the merging linear, sub-linear,  
etc.?

On Apr 8, 2007, at 1:01 AM, Otis Gospodnetic wrote:

> I'd advise against calling optimize() at all in an environment  
> whose indices are constantly updated.  That's what mergeFactor  
> helps with.  Keep it low, and Lucene itself will regularly merge  
> segments more often.  If one still wants to call optimize(), you'd  
> want to know how long it would take on with the index of your size  
> and if you've got enough lull time, do it, otherwise postpone it.
>
> Otis
>  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
>
> ----- Original Message ----
> From: Grant Ingersoll <gsingers@apache.org>
> To: java-dev@lucene.apache.org
> Sent: Friday, April 6, 2007 6:53:13 PM
> Subject: optimize() method call
>
> I was looking at the javadocs for the optimize() call on IndexWriter
> which contain a great amount of detail about what happens, but very
> little guidance on when.  I would like to add more on when.  I
> generally do optimize after I finish my indexing, which is pretty
> straightforward to determine when one has a more or less static
> collection.  What isn't so clear to me, b/c I haven't dealt w/ it too
> much is when optimize should be called in environments that are
> frequently updated.
>
> Here's what I have for text so far:
> *
>     * <p>It is recommended that this method be called upon completion
> of indexing.  In
>     * environments with frequent updates optimize is best FILL IN HERE
>     * </p>
>
> Essentially, I am wondering what are the best practices for calling
> optimize, especially in a frequent update environment.  My gut
> feeling is that it should just be scheduled to be done on a regular
> basis, ideally when there is a lull.  The docs allude to the fact
> that search performance will be better, but has anyone quantified
> it?  The mergeFactor docs say that a smaller merge factor results in
> faster searches on unoptimized (I presume that means relatively
> faster searches to higher merge factors, but still not as fast as
> optimized, correct?)  If it hasn't been quantified, maybe I will try
> to whip a benchmark for it.
>
> So, do people in these types of environment typically schedule
> optimize to occur at night or every few hours, or what?  I know, "It
> depends...", just am wondering if there is a general consensus that
> would be useful to pass along to readers
>
> --------------------------
> Grant Ingersoll
> Center for Natural Language Processing
> http://www.cnlp.org
>
> Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/
> LuceneFAQ
>
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>

------------------------------------------------------
Grant Ingersoll
http://www.grantingersoll.com/
http://lucene.grantingersoll.com
http://www.paperoftheweek.com/



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message