Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 71073 invoked from network); 26 Feb 2010 21:38:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 26 Feb 2010 21:38:55 -0000 Received: (qmail 13785 invoked by uid 500); 26 Feb 2010 21:38:55 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 13724 invoked by uid 500); 26 Feb 2010 21:38:55 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 13716 invoked by uid 99); 26 Feb 2010 21:38:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Feb 2010 21:38:55 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Feb 2010 21:38:48 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id DFDD029A0012 for ; Fri, 26 Feb 2010 13:38:27 -0800 (PST) Message-ID: <1006909263.563131267220307915.JavaMail.jira@brutus.apache.org> Date: Fri, 26 Feb 2010 21:38:27 +0000 (UTC) From: "Michael Goddard (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Updated: (LUCENE-2287) Unexpected terms are highlighted within nested SpanQuery instances In-Reply-To: <1678332126.556831267201647906.JavaMail.jira@brutus.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Goddard updated LUCENE-2287: ------------------------------------ Attachment: LUCENE-2287.patch 0 errors, 2 failures. > Unexpected terms are highlighted within nested SpanQuery instances > ------------------------------------------------------------------ > > Key: LUCENE-2287 > URL: https://issues.apache.org/jira/browse/LUCENE-2287 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/highlighter > Affects Versions: 2.9.1 > Environment: Linux, Solaris, Windows > Reporter: Michael Goddard > Priority: Minor > Attachments: LUCENE-2287.patch, LUCENE-2287.patch, LUCENE-2287.patch, LUCENE-2287.patch > > Original Estimate: 336h > Remaining Estimate: 336h > > I haven't yet been able to resolve why I'm seeing spurious highlighting in nested SpanQuery instances. Briefly, the issue is illustrated by the second instance of "Lucene" being highlighted in the test below, when it doesn't satisfy the inner span. There's been some discussion about this on the java-dev list, and I'm opening this issue now because I have made some initial progress on this. > This new test, added to the HighlighterTest class in lucene_2_9_1, illustrates this: > /* > * Ref: http://www.lucidimagination.com/blog/2009/07/18/the-spanquery/ > */ > public void testHighlightingNestedSpans2() throws Exception { > String theText = "The Lucene was made by Doug Cutting and Lucene great Hadoop was"; // Problem > //String theText = "The Lucene was made by Doug Cutting and the great Hadoop was"; // Works okay > String fieldName = "SOME_FIELD_NAME"; > SpanNearQuery spanNear = new SpanNearQuery(new SpanQuery[] { > new SpanTermQuery(new Term(fieldName, "lucene")), > new SpanTermQuery(new Term(fieldName, "doug")) }, 5, true); > Query query = new SpanNearQuery(new SpanQuery[] { spanNear, > new SpanTermQuery(new Term(fieldName, "hadoop")) }, 4, true); > String expected = "The Lucene was made by Doug Cutting and Lucene great Hadoop was"; > //String expected = "The Lucene was made by Doug Cutting and the great Hadoop was"; > String observed = highlightField(query, fieldName, theText); > System.out.println("Expected: \"" + expected + "\n" + "Observed: \"" + observed); > assertEquals("Why is that second instance of the term \"Lucene\" highlighted?", expected, observed); > } > Is this an issue that's arisen before? I've been reading through the source to QueryScorer, WeightedSpanTerm, WeightedSpanTermExtractor, Spans, and NearSpansOrdered, but haven't found the solution yet. Initially, I thought that the extractWeightedSpanTerms method in WeightedSpanTermExtractor should be called on each clause of a SpanNearQuery or SpanOrQuery, but that didn't get me too far. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org