lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephane Vaucher <vauch...@cirano.qc.ca>
Subject RE: Lucene on Windows
Date Tue, 21 Oct 2003 21:38:53 GMT
Hi Tate (didn't know you were lurking on the list),

I've found that it's often not very clear what truly affects performance. 
Doing batch indexes with  a data set of 250,000 docs (with 10 fields each) 
on a machine with 2 Gbytes of 400 DDR RAM, I've tested a few merge factors 
to discover that it seemed optimal at 50 and even then, performance wasn't 
much better than with a MF of 20. Nowadays, there can be so many hidden 
optimisations by HDs and OSs, that it's often worth testing with each 
configuration used.

sv

On Tue, 21 Oct 2003, Tate Avery wrote:

> Doug,
> 
> Re: high merge factor.  I was building test indexes and writing out 300 segments of 300
docs and merging them every 90,000 kept the 'merging' time down to a minimum (for my slowish
HD).
> 
> I was assuming that 11 of these large merges during the indexing of 1,000,000 docs (plus
a final optimize) would be faster than 10,000 little merges if the mergeFactor was set to
10 (for the same corpus).
> 
> Maybe this is not the case.
> 
> 
> 
> 
> Tate
> 
> 
> -----Original Message-----
> From: Doug Cutting [mailto:cutting@lucene.com]
> Sent: October 21, 2003 12:37 PM
> To: Lucene Users List
> Subject: Re: Lucene on Windows
> 
> 
> Tate Avery wrote:
> > You might have trouble with "too many open files" if you set your mergeFactor too
high.  For example, on my Win2k, I can go up to mergeFactor=300 (or so).  At 400 I get a too
many open files error.  Note: the default mergeFactor of 10 should give no trouble.
> 
> Please note that it is never recommended that you set mergeFactor 
> anywhere near this high.  I don't know why folks do this.  It really 
> doesn't make indexing much faster, and it makes searching slower if you 
> don't optimize.  It's a bad idea.  The default setting of 10 works 
> pretty well.  I've also had good experience setting it as high as 50 on 
> big batch indexing runs, but do not recommend setting it much higher 
> than that.  Even then, this can cause problems if you need to use 
> several indexes at once, or you have lots of fields.
> 
> Doug
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message