Return-Path: Delivered-To: apmail-lucene-java-commits-archive@www.apache.org Received: (qmail 80136 invoked from network); 13 Nov 2009 00:47:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 13 Nov 2009 00:47:40 -0000 Received: (qmail 68419 invoked by uid 500); 13 Nov 2009 00:47:38 -0000 Delivered-To: apmail-lucene-java-commits-archive@lucene.apache.org Received: (qmail 68377 invoked by uid 500); 13 Nov 2009 00:47:38 -0000 Mailing-List: contact java-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-commits@lucene.apache.org Received: (qmail 68342 invoked by uid 99); 13 Nov 2009 00:47:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Nov 2009 00:47:38 +0000 X-ASF-Spam-Status: No, hits=-2.9 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Nov 2009 00:47:36 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id 2EF5E2388878; Fri, 13 Nov 2009 00:47:16 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r835677 - in /lucene/java/trunk/contrib/benchmark: CHANGES.txt src/java/org/apache/lucene/benchmark/byTask/feeds/TrecContentSource.java Date: Fri, 13 Nov 2009 00:47:16 -0000 To: java-commits@lucene.apache.org From: rmuir@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20091113004716.2EF5E2388878@eris.apache.org> Author: rmuir Date: Fri Nov 13 00:47:15 2009 New Revision: 835677 URL: http://svn.apache.org/viewvc?rev=835677&view=rev Log: LUCENE-2059: allow TrecContentSource not to change the docname Modified: lucene/java/trunk/contrib/benchmark/CHANGES.txt lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/TrecContentSource.java Modified: lucene/java/trunk/contrib/benchmark/CHANGES.txt URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/benchmark/CHANGES.txt?rev=835677&r1=835676&r2=835677&view=diff ============================================================================== --- lucene/java/trunk/contrib/benchmark/CHANGES.txt (original) +++ lucene/java/trunk/contrib/benchmark/CHANGES.txt Fri Nov 13 00:47:15 2009 @@ -5,6 +5,13 @@ $Id:$ 11/12/2009 + LUCENE-2059: allow TrecContentSource not to change the docname. + Previously, it would always append the iteration # to the docname. + With the new option content.source.excludeIteration, you can disable this. + The resulting index can then be used with the quality package to measure + relevance. (Robert Muir) + +11/12/2009 LUCENE-2058: specify trec_eval submission output from the command line. Previously, 4 arguments were required, but the third was unused. The third argument is now the desired location of submission.txt (Robert Muir) Modified: lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/TrecContentSource.java URL: http://svn.apache.org/viewvc/lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/TrecContentSource.java?rev=835677&r1=835676&r2=835677&view=diff ============================================================================== --- lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/TrecContentSource.java (original) +++ lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/TrecContentSource.java Fri Nov 13 00:47:15 2009 @@ -48,6 +48,7 @@ *
  • html.parser - specifies the {@link HTMLParser} class to use for * parsing the TREC documents content (default=DemoHTMLParser). *
  • content.source.encoding - if not specified, ISO-8859-1 is used. + *
  • content.source.excludeIteration - if true, do not append iteration number to docname * */ public class TrecContentSource extends ContentSource { @@ -91,6 +92,7 @@ BufferedReader reader; int iteration = 0; HTMLParser htmlParser; + private boolean excludeDocnameIteration; private DateFormatInfo getDateFormatInfo() { DateFormatInfo dfi = dateFormats.get(); @@ -256,7 +258,8 @@ read(docBuf, DOCNO, true, false, null); name = docBuf.substring(DOCNO.length(), docBuf.indexOf(TERMINATING_DOCNO, DOCNO.length())); - name = name + "_" + iteration; + if (!excludeDocnameIteration) + name = name + "_" + iteration; // 3. skip until doc header docBuf.setLength(0); @@ -342,6 +345,7 @@ if (encoding == null) { encoding = "ISO-8859-1"; } + excludeDocnameIteration = config.get("content.source.excludeIteration", false); } }