lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <>
Subject [jira] Commented: (LUCENE-947) Some improvements to contrib/benchmark
Date Mon, 23 Jul 2007 21:01:35 GMT


Doron Cohen commented on LUCENE-947:

Thanks for fixing this Michael, and as usual so fast! 

I was able to run the new alg files and the new tests.

Few more comments: 

WriteLineDocTask has all the work done in Setup(). This is a bit wrong(?) Usually only preparation
is done in the Setup(), but real work (things we measure) should be in doLogic().  Mmm...
would probably make more sense to move the file handling code from the constructor to setup(),
and the doc creation code (except for docMaker extraction) from setup() to doLogic(). This
should also prevent the error in TestPerfTasksParse (I think no changes would then be required
in this test.) 

Unused imports and dateFormat in 
 - LineDocMaker
 - WriteLineDocTask 

For LineDocMaker, I was puzzled why you chose not to implement getNextDocData() and not base
on BasicDocMaker to create the next doc for you. I now understand this is for reusing the
Document and Field objects that BasicDocMaker does not support. I would add a comment on that.

The new consts in BasicDocMaker can now be used in few more places....

> Some improvements to contrib/benchmark
> --------------------------------------
>                 Key: LUCENE-947
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-947.patch, LUCENE-947.take2.patch, LUCENE-947.take3.patch
> I've made some small improvements to the contrib/benchmark, mostly
> merging in the ad-hoc benchmarking code I've been using in LUCENE-843:
>   - Fixed thread safety of DirDocMaker's usage of SimpleDateFormat
>   - Print the props in sorted order
>   - Added new config "autocommit=true|false" to CreateIndexTask
>   - Added new config "ram.flush.mb=int" to AddDocTask
>   - Added new configs "doc.term.vector.positions=true|false" and
>     "doc.term.vector.offsets=true|false" to BasicDocMaker
>   - Added, so you can make an alg that uses this
>     to build up a single file containing one document per line in a
>     single file.  EG this alg converts the reuters-out tree into a
>     single file that has ~1000 bytes per body field, saved to
>     work/reuters.1000.txt:
>       docs.dir=reuters-out
>       doc.maker=org.apache.lucene.benchmark.byTask.feeds.DirDocMaker
>       line.file.out=work/reuters.1000.txt
>       doc.maker.forever=false
>       {WriteLineDoc(1000)}: *
>     Each line has tab-separted TITLE, DATE, BODY fields.
>   - Created feeds/ that creates documents read from
>     the file created by  EG this alg indexes
>     all documents created above:
>       analyzer=org.apache.lucene.analysis.SimpleAnalyzer
>       directory=FSDirectory
>       doc.add.log.step=500
>       docs.file=work/reuters.1000.txt
>       doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
>       doc.tokenized=true
>       doc.maker.forever=false
>       ResetSystemErase
>       CreateIndex
>       {AddDoc}: *
>       CloseIndex
>       RepSumByPref AddDoc
> I'll attach initial patch shortly.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message