Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 45894 invoked from network); 23 Apr 2008 20:16:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Apr 2008 20:16:30 -0000 Received: (qmail 62087 invoked by uid 500); 23 Apr 2008 20:16:29 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 62045 invoked by uid 500); 23 Apr 2008 20:16:29 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 62034 invoked by uid 99); 23 Apr 2008 20:16:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Apr 2008 13:16:29 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [207.190.255.68] (HELO zimbra.blackducksoftware.com) (207.190.255.68) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Apr 2008 20:15:34 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by zimbra.blackducksoftware.com (Postfix) with ESMTP id A7E44321003; Wed, 23 Apr 2008 16:15:53 -0400 (EDT) X-Virus-Scanned: amavisd-new at X-Spam-Score: -2.433 X-Spam-Level: Received: from zimbra.blackducksoftware.com ([127.0.0.1]) by localhost (zimbra.blackducksoftware.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id McsXPBHDl2de; Wed, 23 Apr 2008 16:15:49 -0400 (EDT) Received: from twinkie.blackducksoftware.com (unknown [207.190.255.66]) by zimbra.blackducksoftware.com (Postfix) with ESMTP id 124713210A7; Wed, 23 Apr 2008 16:15:49 -0400 (EDT) Subject: SpanScorer handling of non-disjoint phrases From: David Kaelbling To: java-dev@lucene.apache.org Cc: David Kaelbling Content-Type: text/plain Date: Wed, 23 Apr 2008 16:15:48 -0400 Message-Id: <1208981748.12714.24.camel@twinkie.blackducksoftware.com> Mime-Version: 1.0 X-Mailer: Evolution 2.0.2 (2.0.2-35.0.4.el4_6.1) Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Flag: NO X-Old-Spam-Status: No, score=-2.433 tagged_above=-10 required=6.6 tests=[AWL=0.066, BAYES_00=-2.599, RDNS_NONE=0.1] Hi, I've been using the 2.3.1 contrib highlighter with the 2/10/2008 SpanHighlighter patch, and have run into some trouble. If I have two phrases in a query that share terms (e.g. "hello world" and "hello goodbye") the SpanScorer seems to not highlight 'hello' consistently. It looks to me like WeightedSpanTermExtractor.extract() is clobbering the span positions for 'hello' the second time it encounters the term. Should terms.putAll(booleanTerms) and terms.putAll(disjunctTerms) really be replacing the old entry, or should the try to addPositionSpans()? Thanks, David PS: And while I'm asking, it looks like getWeightedSpanTermsWithScores() will wrap the cachingTokenFilter passed it by SpanScorer.init() into another CachingTokenFilter, duplicating the cache? -- David Kaelbling Senior Software Engineer Black Duck Software, Inc. dkaelbling@blackducksoftware.com T +1.781.810.2041 F +1.781.891.5145 http://www.blackducksoftware.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org