lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
Date Sun, 09 Mar 2008 13:45:46 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576770#action_12576770
] 

Mark Miller commented on LUCENE-1209:
-------------------------------------

My algorithm is below.

I see "Round 0-->1:   doc.term.vector:false-->true" as well...however if I put a debug
print on what is returned from public boolean get (String name, boolean dflt), it is only
ever called once for "doc.term.vector" as well as the other guys in setConfig.

More importantly, lets say I set it to true:false....if I look at the work/index directory
on the second run, there are certainly term vectors. Thats how I noticed this to begin with...I
was looking at the index and saw the term vector files on every round. Its possible I have
something messed up, but every time I run through everything again and it really does not
seem to be working. If I set term vectors to false:true, they are never made in any round.

- Mark


<code>
ram.flush.mb=flush:32:32
compound=false

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=tok:false:true
doc.term.vector=vec:true:false
doc.term.vector.offsets=tvo:false:true
doc.term.vector.positions=tvp:false:true
doc.add.log.step=2000

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# -------------------------------------------------------------------------------------

{ "Rounds"
      
    ResetSystemErase

        CreateIndex
        { "MAddDocs" AddDoc(60) } : 20000
        Optimize
        CloseIndex
  
    OpenReader
      { "SrchTrvRetNewRdr" SearchTravRet(10) > : 1000
    CloseReader
    OpenReader
      { "SearchHlgtSameRdr" SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body])
> : 1000

    CloseReader

    RepSumByPref SearchHlgtSameRdr

    NewRound

} : 2

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
</code>

> If setConfig(Config config) is called in resetInputs(), you can turn term vectors off
and on by round
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1209
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.4
>            Reporter: Mark Miller
>            Priority: Trivial
>         Attachments: reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors and not using
term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is preferable to me
in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have setConfig(Config config)
called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date
per round if you reset.
> - Mark

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message