lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: Benchmarkers
Date Tue, 04 Apr 2006 08:32:11 GMT

On Apr 3, 2006, at 6:26 PM, Marvin Humphrey wrote:

>
> On Apr 3, 2006, at 5:43 PM, Doug Cutting wrote:
>
>> Marvin Humphrey wrote:
>>> Plucene is a Lucene 1.3 port, so it doesn't have  
>>> max_buffered_docs --  but I can set merge_factor to 1000.
>>
>> I would not recommend that.  With a merge factor that high you may  
>> run out of file handles, and, moreover, I doubt that disks are  
>> very efficient when reading from that many streams.
>
> Running out of filehandles is a solvable problem because you can  
> set ulimit -n to whatever on OS X -- and you pretty much have to,  
> since the default is 256.
>
> The streams issue is more complicated.  N-way merges from disk tend  
> to be IO-bound.  The best I can do is try a couple numbers and see  
> what works.  IIRC, the number 100 has gone by on the Plucene  
> mailing list as a good value.

The higher the better, it seems.  Here's times to index 1000 docs:

merge_factor  secs
10            141
30            123
100           107
250           100
1000           89

I suspect that Plucene is so CPU-bound that the IO doesn't come into  
play.

Marvin Humphrey
Rectangular Research
http://www.rectangular.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message