lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject contrib/benchmark questions
Date Sun, 18 Mar 2007 17:16:14 GMT
I'm using contrib/benchmark to do some tests for my ApacheCon talk  
and have some questions.

1. In looking at micro-standard.alg, it seems like not all braces are  
closed.  Is a line ending a separator too?
2. Is there anyway to dump out what params are supported by the  
various tasks?  I am esp. uncertain on the Search related tasks.
3. Is there anyway to dump out the stats as a CSV file or something?   
Would I implement a Task for this?  Ultimately, I want to be able to  
create a graph in Excel that shows tradeoffs between speed and memory.
4. Is there a way to set how many tabs occur between columns in the  
final report?  They merge and buffer factors get hard to read for  
larger values.
5. Below is my "alg" file, any tips?  What I am trying to do is show  
the tradeoffs of merge factor and max buffered and how it relates to  
memory and indexing time.  I want to process all the documents in the  
Reuters benchmark collection, not the 2000 in the micro-standard.  I  
don't want any pauses and for now I am happy doing things in serial.   
I think it is doing what I want, but am not 100% certain.

-----------  alg file --------

#last value is more than all the docs in reuters
merge.factor=mrg:10:100:1000:5000:10:10:10:10:100:1000
max.buffered=buf:10:10:10:10:100:1000:10000:21580:21580:21580
compound=true

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory
#directory=RamDirectory

doc.stored=true
doc.tokenized=true
doc.term.vector=false
doc.add.log.step=1000

docs.dir=reuters-out
#docs.dir=reuters-111

#doc.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker
doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

#query.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleQueryMaker
query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
#  
------------------------------------------------------------------------ 
-------------

{ "Rounds"

     ResetSystemErase

     { "Populate"
         CreateIndex
         { "MAddDocs" AddDoc > : 22000
         Optimize
         CloseIndex
     }

     OpenReader
     { "SearchSameRdr" Search > : 5000
     CloseReader

     { "WarmNewRdr" Warm > : 50

     { "SrchNewRdr" Search > : 500

     { "SrchTrvNewRdr" SearchTrav > : 300

     { "SrchTrvRetNewRdr" SearchTravRet > : 100

     NewRound

} : 10

RepSumByName
RepSumByPrefRound MAddDocs


Thanks,
Grant

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message