lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: MergeFactor and MaxBufferedDocs value should ...?
Date Sat, 24 Mar 2007 11:43:14 GMT
I would also suggest that contrib/benchmark in the source has a nice  
framework for experimenting with different factors for mergeFactor  
and maxBufferedDocs.  It is quite easy to set it up for a new  
collection (i.e. yours) and run experiments that alter these two values.

Below is a sample "algorithm" file that I have been trying out.  To  
make it work on yours, you need only implement a DocMaker that works  
for your collection (you probably already have the stuff for making  
Documents, you just need to implement it in the DocMaker interface  
and plug it in)


merge.factor=merge:10:100:1000:5000:10:10:10:10:100:1000:100:100
max.buffered=max.buffered: 
10:10:10:10:100:1000:10000:21580:21580:21580:1000:10000
compound=true

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory
#directory=RamDirectory

doc.stored=true
doc.tokenized=true
doc.term.vector=false
doc.add.log.step=1000

docs.dir=reuters-out
#docs.dir=reuters-111

#doc.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker
doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

#query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker
query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
#  
------------------------------------------------------------------------ 
-------------

{ "Rounds"

     ResetSystemErase

     { "Populate-Opt"
         CreateIndex
         { "MAddDocs" AddDoc > : 22000
         Optimize
         CloseIndex
     }

     NewRound

} : 13

RepSumByName
RepSumByPrefRound MAddDocs
RepSumByPrefRound Populate-Opt


On Mar 23, 2007, at 2:51 AM, SK R wrote:

> Hi,
>    I've looked the uses of MergeFactor and MaxBufferedDocs.
>
>    If I set MergeFactor = 100 and MaxBufferedDocs=250 , then first 100
> segments will be merged in RAMDir when 100 docs arrived. At the end  
> of 350th
> doc added to writer , RAMDir have 2 merged segment files + 50 seperate
> segment files not merged together and these are flushed to FSDir.
>
>    If wrong, please correct me.
>
>    My doubt is whether we should set MergeFactor & MaxBufferedDocs in
> proportional ratio (i.e) MaxBufferedDocs = n*MergeFactor where n =  
> 1,2 ...
> to reduce indexing time and get greater performance or no need to  
> worry
> about it's relation?
>
>
> Thanks & Regards
> RSK

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message