Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6FDF5D909 for ; Wed, 17 Oct 2012 13:48:34 +0000 (UTC) Received: (qmail 32079 invoked by uid 500); 17 Oct 2012 13:48:32 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 32033 invoked by uid 500); 17 Oct 2012 13:48:32 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 32024 invoked by uid 99); 17 Oct 2012 13:48:32 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2012 13:48:32 +0000 X-ASF-Spam-Status: No, hits=3.0 required=5.0 tests=FORGED_YAHOO_RCVD,SPF_NEUTRAL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [216.139.236.26] (HELO sam.nabble.com) (216.139.236.26) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Oct 2012 13:48:26 +0000 Received: from ben.nabble.com ([192.168.236.152]) by sam.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1TOTyf-00019A-Ij for java-user@lucene.apache.org; Wed, 17 Oct 2012 06:48:05 -0700 Date: Wed, 17 Oct 2012 06:48:05 -0700 (PDT) From: "Zeynep P." To: java-user@lucene.apache.org Message-ID: <1350481685564-4014238.post@n3.nabble.com> Subject: Lucene 4.0 benchmark bug? MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi to all, I started to use benchmark 4.0 to create submission report files with the following code: BufferedReader br = new BufferedReader(fr); QualityQuery qqs[] = qReader.readQueries(br); QualityQueryParser qqParser = new SimpleQQParser("title", "body"); QualityBenchmark qrun = new QualityBenchmark(qqs, qqParser, searcher, "docname") ; SubmissionReport submitLog = new SubmissionReport(loggertest, "test"); QualityStats stats[] = qrun.execute(null, submitLog, null); My index is created by lucene 3.6. I use LA Times topics 401-450. With 3.6, no problem. However, when I use benchmark 4.0 I realised that it returns the results only for the first query 401 which is "foreign minorities, Germany". When I debug the code, at SimpleQQParser, the boolean query generated is "body:foreign" without other keywords. I go on debugging and it seems that the problem is raised at QueryParserBase.newFieldQuery which returns null for the rest of all queries and other keywords in the same query. I updated the code for my adhoc use. Unless, I don't know how to fix it or it also happens to someone else?! Second problem, for the same collection MAP = 0.17 with default similarity, MAP= 0.07 with lucene 4.0 BM25 similarity (b=0.75, k1=1.2). I got MAP = 0.14 with BM25 implemented based on http://ipl.cs.aueb.gr/stougianni/bm25_2.html. However this collection is represented in the litterature with MAP around 0.25 with BM25 scoring function. Did someone evaluate the different similarities and can share the results? Best Regards, ZP -- View this message in context: http://lucene.472066.n3.nabble.com/Lucene-4-0-benchmark-bug-tp4014238.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org