Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 35419 invoked from network); 19 Mar 2007 20:10:50 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Mar 2007 20:10:50 -0000 Received: (qmail 37078 invoked by uid 500); 19 Mar 2007 20:10:50 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 37035 invoked by uid 500); 19 Mar 2007 20:10:50 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 37024 invoked by uid 99); 19 Mar 2007 20:10:50 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Mar 2007 13:10:50 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [208.97.132.5] (HELO spunkymail-a11.g.dreamhost.com) (208.97.132.5) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Mar 2007 13:10:40 -0700 Received: from [192.168.0.2] (adsl-074-229-189-244.sip.rmo.bellsouth.net [74.229.189.244]) by spunkymail-a11.g.dreamhost.com (Postfix) with ESMTP id D0F5DB7967 for ; Mon, 19 Mar 2007 13:10:18 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v752.2) In-Reply-To: References: Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: <248F40A1-3D59-420C-A02D-27AB233562E8@apache.org> Content-Transfer-Encoding: 7bit From: Grant Ingersoll Subject: Re: contrib/benchmark questions Date: Mon, 19 Mar 2007 16:10:16 -0400 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.752.2) X-Virus-Checked: Checked by ClamAV on apache.org Thanks for the reply, Doron. I knew this email was targeted for you, but thought it would be good to add to the user record. On Mar 19, 2007, at 2:30 PM, Doron Cohen wrote: > Grant Ingersoll wrote on 18/03/2007 10:16:14: > >> I'm using contrib/benchmark to do some tests for my ApacheCon talk >> and have some questions. >> >> 1. In looking at micro-standard.alg, it seems like not all braces are >> closed. Is a line ending a separator too? > > '>' can replace as a closing character (alternatively) either '}' > or ']' > with the semantics: "do not collect/report separate statistics for the > contained tasks. See "Statistic recording elimination" in > http://lucene.apache.org/java/docs/api/org/apache/lucene/benchmark/ > byTask/package-summary.html So, if I am understanding correctly: >> "SearchSameRdr" Search > : 5000 means don't collect indiv. stats fur SearchSameRdr, but do whatever that task does 5000 times, right? > >> 2. Is there anyway to dump out what params are supported by the >> various tasks? I am esp. uncertain on the Search related tasks. > > Search related tasks do not take args. Perhaps the task should > throw an > exception if a params is set but not supported. I think I'll add that. > Currently only AdDoc, DeleteDoc and SetProp take args. The section > "Command > parameter" in > http://lucene.apache.org/java/docs/api/org/apache/lucene/benchmark/ > byTask/package-summary.html > which describes this is incomplete - I will fix it to reflect that. > > Which query arguments do you have in mind? Never mind, I was confused by the : XXXX parameters after the > > >> 3. Is there anyway to dump out the stats as a CSV file or something? >> Would I implement a Task for this? Ultimately, I want to be able to >> create a graph in Excel that shows tradeoffs between speed and >> memory. > > Yes, implementing a report task would be the way. > ... but when I look at how I implemented these reports, all the > work is > done in the class Points. Seems it should be modified a little with > more > thought of making it easiert to extend reports. I may take a crack at it, but deadline for the talk is looming > >> 4. Is there a way to set how many tabs occur between columns in the >> final report? They merge and buffer factors get hard to read for >> larger values. > > There's no general tabbing control, can be added if required, - but > for the > automatically added columns this is not requireed - just modify the > name of > the column and it would fit, e.g. use "merge:10:100" to get a 5 > charactres > column, or "merging:10:100" for 7, etc. (Also see "Index work > parameters" > under "Benchmark properties" in > http://lucene.apache.org/java/docs/api/org/apache/lucene/benchmark/ > byTask/package-summary.html > >> 5. Below is my "alg" file, any tips? What I am trying to do is show >> the tradeoffs of merge factor and max buffered and how it relates to >> memory and indexing time. I want to process all the documents in the >> Reuters benchmark collection, not the 2000 in the micro-standard. I >> don't want any pauses and for now I am happy doing things in serial. >> I think it is doing what I want, but am not 100% certain. >> > > Yes, it seems correct to me. What I usually do to verify a new alg > is to > run it first with very small numbers - e.g. 10 instead of 22000, > etc., and > examine the log. Few comments: > - you can specify a larger number than 22000 and the Docmaker will > iterate > and created new docs from same input again. > - Being intetested in memory stats - the thing that all the rounds > run in a > single program, same JVM run, usually means what you see is very much > dependent in the GC behavior of the specific VM you are using. If > it does > not release memory (most likely) to the OS you would not be able to > notice > that round i+1 used less memory than round i. It would probably > better for > something like this to put the "round" logic in an ant script, > invoking > each round in a separate new exec. But then things get more > complicated for > having a final stats report containing all rounds. What do you > think about > this? Good to know. Perhaps a GarbageCollectionTask is needed? > - Seems you are only inrerested in the indexing performance, so you > can > remove (or comment out) the search part. > - If you are intrerested also in the search part, note that as > written, the > four last search related tasks always use a new reader (opening/ > closing 950 > readers in this test). OK, search is the second part, just focused on indexing first. Trying to address common questions/issues people have with performance in these two areas. So, I should wrap those task in an OpenReader/CloseReader? We may also want to consider making this an XML based type configuration... Thanks for your help. I will probably have a few more questions over the next few days. -Grant --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org