lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-947) Some improvements to contrib/benchmark
Date Mon, 23 Jul 2007 10:34:31 GMT


Michael McCandless commented on LUCENE-947:

Thanks for the review Doron!

> 1. TestPerfTasksParse - why do you prevent the testing of parsing of WriteLineDoc? 
>     I disabled the special handling of this and the test works as supposed.

Hmmm ... I was seeing a failure if I didn't do that because
WriteLineDoc requires "line.file.out" Config to be set and that test
didn't know to do so.  I'll put it back into the test but add
"line.file.out" for this task.

> 2. Documentation of new properties is missing:
>      - In CreateIndexTask: ram.flush.mb [0],  autocommit [true]
>      - In byTask.package.html (same 2 props).

OK, I'll add this and also for "doc.term.vector.{offsets,positions}"
to BasicDocMaker.

> 3. run.flush & aotoCommit should be added & used & documented also in openIndexTask
(currently only used in createIndexTask).

OK, I'll add this.

> 4. AddDocTask:  flushAtRAMUsage - unused?

Yup, this was leftover from pre LUCENE-843 where you had to check RAM
usage after each doc and then flush.  I'll remove it and actually just revert
to current (I don't need any mods here).

> 5. buil.xml - 1024m as default for running a benchmark seems too much?
>     I mean, one of the nice things about Lucene is that it can run for you even if you
only have few MB of RAM to spare. For someone with a low level machine, say 512M only, the
JVM might fail to even start, right?

Woops... I didn't mean to put this change in.  I'll leave it where it
was (140 MB) and remove the "-server" jvmarg as well.  I was hitting
OOM on some Wikipedia algs.

> 6. I like your change of factoring some of the field names into consts. We should probably
do the same for the rest.

OK I'll pull out the remaining ones...

> 7. I didn' t try the new WriteLineDocTask and LineDocMaker feed. Partly because there
was no ready to use alg for that under conf/, and also no test for that. Do you think we should
add at least one of these two (preferably both)?  - I can help with this.

OK I'll do both of these.

> Some improvements to contrib/benchmark
> --------------------------------------
>                 Key: LUCENE-947
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-947.patch, LUCENE-947.take2.patch
> I've made some small improvements to the contrib/benchmark, mostly
> merging in the ad-hoc benchmarking code I've been using in LUCENE-843:
>   - Fixed thread safety of DirDocMaker's usage of SimpleDateFormat
>   - Print the props in sorted order
>   - Added new config "autocommit=true|false" to CreateIndexTask
>   - Added new config "ram.flush.mb=int" to AddDocTask
>   - Added new configs "doc.term.vector.positions=true|false" and
>     "doc.term.vector.offsets=true|false" to BasicDocMaker
>   - Added, so you can make an alg that uses this
>     to build up a single file containing one document per line in a
>     single file.  EG this alg converts the reuters-out tree into a
>     single file that has ~1000 bytes per body field, saved to
>     work/reuters.1000.txt:
>       docs.dir=reuters-out
>       doc.maker=org.apache.lucene.benchmark.byTask.feeds.DirDocMaker
>       line.file.out=work/reuters.1000.txt
>       doc.maker.forever=false
>       {WriteLineDoc(1000)}: *
>     Each line has tab-separted TITLE, DATE, BODY fields.
>   - Created feeds/ that creates documents read from
>     the file created by  EG this alg indexes
>     all documents created above:
>       analyzer=org.apache.lucene.analysis.SimpleAnalyzer
>       directory=FSDirectory
>       doc.add.log.step=500
>       docs.file=work/reuters.1000.txt
>       doc.maker=org.apache.lucene.benchmark.byTask.feeds.LineDocMaker
>       doc.tokenized=true
>       doc.maker.forever=false
>       ResetSystemErase
>       CreateIndex
>       {AddDoc}: *
>       CloseIndex
>       RepSumByPref AddDoc
> I'll attach initial patch shortly.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message