lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
Date Sun, 09 Mar 2008 08:01:46 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576723#action_12576723
] 

Doron Cohen commented on LUCENE-1209:
-------------------------------------

Mark you are right that setConfig is called just once, at start.
At least for setting properties by round this should be sufficient. 
I wonder why this doesn't work for you.

I tried with this one:

{code}
compound=true

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=RamDirectory

doc.stored=true
doc.tokenized=true
doc.term.vector=termVec:false:true
doc.add.log.step=10

doc.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker
task.max.depth.log=1

{

    { "Populate"
        CreateIndex
        { AddDoc > : 50
        Optimize
        CloseIndex
    >

    ResetSystemErase
    NewRound

} : 2

RepSumByName
RepSelectByPref Populate
{code}

And got this output:
{code}
 Working Directory: work
 Running algorithm from: conf\termVecByRound.alg
 ------------> config properties:
 analyzer = org.apache.lucene.analysis.standard.StandardAnalyzer
 compound = true
 directory = RamDirectory
 doc.add.log.step = 10
 doc.maker = org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker
 doc.stored = true
 doc.term.vector = termVec:false:true
 doc.tokenized = true
 task.max.depth.log = 1
 work.dir = work
 -------------------------------
 ------------> algorithm:
 Seq {
     Seq_2 {
         Populate {
             CreateIndex
             Seq_50 {
                 AddDoc
             > * 50
             Optimize
             CloseIndex
         >
         ResetSystemErase
         NewRound
     } * 2
     RepSumByName
     RepSelectByPref Populate
 }
 
 ------------> starting task: Seq
 ------------> starting task: Seq_2
 --> 0.1 sec: main processed (add) 10 docs
 --> 0.1 sec: main processed (add) 20 docs
 --> 0.11 sec: main processed (add) 30 docs
 --> 0.11 sec: main processed (add) 40 docs
 --> 0.11 sec: main processed (add) 50 docs
 ------------> SimpleDocMaker statistics (0): 
 num docs added since last inputs reset:                   50
 total bytes added since last inputs reset:             42,150
 
 
 
 --> Round 0-->1:   doc.term.vector:false-->true
 
 --> 0 sec: main processed (add) 60 docs
 --> 0 sec: main processed (add) 70 docs
 --> 0 sec: main processed (add) 80 docs
 --> 0 sec: main processed (add) 90 docs
 --> 0 sec: main processed (add) 100 docs
 ------------> SimpleDocMaker statistics (1): 
 num docs added since last inputs reset:                   50
 total bytes added since last inputs reset:             42,150
 
 
 
 --> Round 1-->2:   doc.term.vector:true-->false
 
 
 ------------> Report Sum By (any) Name (2 about 3 out of 4)
 Operation   round termVec   runCnt   recsPerRun        rec/s  elapsedSec    avgUsedMem  
 avgTotalMem
 Seq_2           0   false        1          106        530.0        0.20       639,912  
   5,177,344
 Populate        -       -        2           53        706.7        0.15       839,552  
   5,177,344
 
 
 ------------> Report Select By Prefix (Populate) (2 about 2 out of 4)
 Operation   round termVec   runCnt   recsPerRun        rec/s  elapsedSec    avgUsedMem  
 avgTotalMem
 Populate        0   false        1           53        378.6        0.14       858,080  
   5,177,344
 Populate -  -   1 -  true -  -   1 -  -  -   53 -  - 5,300.0 -  -   0.01 -  -  821,024 -
 - 5,177,344
 
 ####################
 ###  D O N E !!! ###
 ####################
{code}

Note in particular this line:
{code}
[java] --> Round 0-->1:   doc.term.vector:false-->true 
{code}

Note that a *NewRound* command is required in order for the round number to change. 
{code}
    NewRound
{code}

A possible cause for error is that the property definition parsing requires a property name
prefix for multi-valued properties.
So this would not work as expected:
{code}
doc.term.vector=false:true
{code}

But this will work:
{code}
doc.term.vector=termVec:false:true
{code}

If it still doesn't work for you, can you post here the algorithm?

> If setConfig(Config config) is called in resetInputs(), you can turn term vectors off
and on by round
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1209
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.4
>            Reporter: Mark Miller
>            Priority: Trivial
>         Attachments: reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors and not using
term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is preferable to me
in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have setConfig(Config config)
called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date
per round if you reset.
> - Mark

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message