lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-967) Add "tokenize documents only" task to contrib/benchmark
Date Wed, 01 Aug 2007 00:32:53 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12516837
] 

Doron Cohen commented on LUCENE-967:
------------------------------------

Also, I think the addition of printing of elapsed time is redundant, 
because you get it anyhow as the elapsed time reported for the 
outermost task sequence. (?)

For instance, if you add to tokenize.alg this line:
     RepSumByName
You get this output:
     Operation   round   runCnt   recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
     Seq_Exhaust     0        1        21578        638.2       33.81    15,694,368     20,447,232
     Net elapsed time: 33.809 sec
So the total elapsed time is actually printed twice now - do we need this?


> Add "tokenize documents only" task to contrib/benchmark
> -------------------------------------------------------
>
>                 Key: LUCENE-967
>                 URL: https://issues.apache.org/jira/browse/LUCENE-967
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.3
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-967.patch, LUCENE-967.take2.patch
>
>
> I've been looking at performance improvements to tokenization by
> re-using Tokens, and to help benchmark my changes I've added a new
> task called ReadTokens that just steps through all fields in a
> document, gets a TokenStream, and reads all the tokens out of it.
> EG this alg just reads all Tokens for all docs in Reuters collection:
>   doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker
>   doc.maker.forever=false
>   {ReadTokens > : *

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message