Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 48321 invoked from network); 9 Mar 2008 13:53:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Mar 2008 13:53:10 -0000 Received: (qmail 90095 invoked by uid 500); 9 Mar 2008 13:53:05 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 90050 invoked by uid 500); 9 Mar 2008 13:53:05 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 90039 invoked by uid 99); 9 Mar 2008 13:53:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Mar 2008 06:53:05 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 09 Mar 2008 13:52:37 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 36F02234C086 for ; Sun, 9 Mar 2008 06:51:46 -0700 (PDT) Message-ID: <469426609.1205070706210.JavaMail.jira@brutus> Date: Sun, 9 Mar 2008 06:51:46 -0700 (PDT) From: "Mark Miller (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Issue Comment Edited: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round In-Reply-To: <979731930.1204942906322.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576770#action_12576770 ] markrmiller@gmail.com edited comment on LUCENE-1209 at 3/9/08 6:51 AM: ------------------------------------------------------------- My algorithm is below. I see "Round 0-->1: doc.term.vector:false-->true" as well...however if I put a debug print on what is returned from public boolean get (String name, boolean dflt), it is only ever called once for "doc.term.vector" as well as the other guys in setConfig. More importantly, lets say I set it to true:false....if I look at the work/index directory on the second run, there are certainly term vectors. Thats how I noticed this to begin with...I was looking at the index and saw the term vector files on every round. Its possible I have something messed up, but every time I run through everything again and it really does not seem to be working. If I set term vectors to false:true, they are never made in any round. >>Mark you are right that setConfig is called just once, at start. >>At least for setting properties by round this should be sufficient. >>I wonder why this doesn't work for you. I think this admits the problem right? The get property for everything in setConfig is only called once...that loads up the "false:true", returns false, and sets up "true" to be returned on the next call...the next time you call get on Config you will get the "true"...but there is no next time. Its only done once...so it shows up right in the output "Round 0-->1: doc.term.vector:false-->true", but its only every called once and so only loads false. - Mark {code} ram.flush.mb=flush:32:32 compound=false analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer directory=FSDirectory doc.stored=true doc.tokenized=tok:false:true doc.term.vector=vec:true:false doc.term.vector.offsets=tvo:false:true doc.term.vector.positions=tvp:false:true doc.add.log.step=2000 docs.dir=reuters-out doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker # task at this depth or less would print when they start task.max.depth.log=2 log.queries=true # ------------------------------------------------------------------------------------- { "Rounds" ResetSystemErase CreateIndex { "MAddDocs" AddDoc(60) } : 20000 Optimize CloseIndex OpenReader { "SrchTrvRetNewRdr" SearchTravRet(10) > : 1000 CloseReader OpenReader { "SearchHlgtSameRdr" SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body]) > : 1000 CloseReader RepSumByPref SearchHlgtSameRdr NewRound } : 2 RepSumByNameRound RepSumByName RepSumByPrefRound MAddDocs {code} was (Author: markrmiller@gmail.com): My algorithm is below. I see "Round 0-->1: doc.term.vector:false-->true" as well...however if I put a debug print on what is returned from public boolean get (String name, boolean dflt), it is only ever called once for "doc.term.vector" as well as the other guys in setConfig. More importantly, lets say I set it to true:false....if I look at the work/index directory on the second run, there are certainly term vectors. Thats how I noticed this to begin with...I was looking at the index and saw the term vector files on every round. Its possible I have something messed up, but every time I run through everything again and it really does not seem to be working. If I set term vectors to false:true, they are never made in any round. - Mark {code} ram.flush.mb=flush:32:32 compound=false analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer directory=FSDirectory doc.stored=true doc.tokenized=tok:false:true doc.term.vector=vec:true:false doc.term.vector.offsets=tvo:false:true doc.term.vector.positions=tvp:false:true doc.add.log.step=2000 docs.dir=reuters-out doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker # task at this depth or less would print when they start task.max.depth.log=2 log.queries=true # ------------------------------------------------------------------------------------- { "Rounds" ResetSystemErase CreateIndex { "MAddDocs" AddDoc(60) } : 20000 Optimize CloseIndex OpenReader { "SrchTrvRetNewRdr" SearchTravRet(10) > : 1000 CloseReader OpenReader { "SearchHlgtSameRdr" SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body]) > : 1000 CloseReader RepSumByPref SearchHlgtSameRdr NewRound } : 2 RepSumByNameRound RepSumByName RepSumByPrefRound MAddDocs {code} > If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round > ----------------------------------------------------------------------------------------------------- > > Key: LUCENE-1209 > URL: https://issues.apache.org/jira/browse/LUCENE-1209 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/benchmark > Affects Versions: 2.4 > Reporter: Mark Miller > Priority: Trivial > Attachments: reset_config.patch > > > I want to be able to run one benchmark that tests things using term vectors and not using term vectors. > Currently this is not easy because you cannot specify term vectors per round. > While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests. > If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset. > - Mark -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org