lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-947) Some improvements to contrib/benchmark
Date Tue, 24 Jul 2007 10:02:31 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12514936
] 

Michael McCandless commented on LUCENE-947:
-------------------------------------------

This looks great Doron, thanks.

I have one more mod, which is to prefix the log prints from AddDocTask with the net elapsed
time since startup (I like to see net elapsed time as algo is running to get a sense of performance
difference before full task finishes...).  I will attach a new patch.

> Some improvements to contrib/benchmark
> --------------------------------------
>
>                 Key: LUCENE-947
>                 URL: https://issues.apache.org/jira/browse/LUCENE-947
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-947.patch, LUCENE-947.take2.patch, LUCENE-947.take3.patch,
LUCENE-947.take4.patch
>
>
> I've made some small improvements to the contrib/benchmark, mostly
> merging in the ad-hoc benchmarking code I've been using in LUCENE-843:
>   - Fixed thread safety of DirDocMaker's usage of SimpleDateFormat
>   - Print the props in sorted order
>   - Added new config "autocommit=true|false" to CreateIndexTask
>   - Added new config "ram.flush.mb=int" to AddDocTask
>   - Added new configs "doc.term.vector.positions=true|false" and
>     "doc.term.vector.offsets=true|false" to BasicDocMaker
>   - Added WriteLineDocTask.java, so you can make an alg that uses this
>     to build up a single file containing one document per line in a
>     single file.  EG this alg converts the reuters-out tree into a
>     single file that has ~1000 bytes per body field, saved to
>     work/reuters.1000.txt:
>       docs.dir=reuters-out
>       doc.maker=org.apache.lucene.benchmark.byTask.feeds.DirDocMaker
>       line.file.out=work/reuters.1000.txt
>       doc.maker.forever=false
>       {WriteLineDoc(1000)}: *
>     Each line has tab-separted TITLE, DATE, BODY fields.
>   - Created feeds/LineDocMaker.java that creates documents read from
>     the file created by WriteLineDocTask.java.  EG this alg indexes
>     all documents created above:
>       analyzer=org.apache.lucene.analysis.SimpleAnalyzer
>       directory=FSDirectory
>       doc.add.log.step=500
>       docs.file=work/reuters.1000.txt
>       doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
>       doc.tokenized=true
>       doc.maker.forever=false
>       ResetSystemErase
>       CreateIndex
>       {AddDoc}: *
>       CloseIndex
>       RepSumByPref AddDoc
> I'll attach initial patch shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message