Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 306F3200C16 for ; Thu, 9 Feb 2017 20:00:50 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 2F200160B50; Thu, 9 Feb 2017 19:00:50 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 77169160B4B for ; Thu, 9 Feb 2017 20:00:49 +0100 (CET) Received: (qmail 49525 invoked by uid 500); 9 Feb 2017 19:00:48 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 49515 invoked by uid 99); 9 Feb 2017 19:00:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Feb 2017 19:00:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id AB5E018285E for ; Thu, 9 Feb 2017 19:00:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.199 X-Spam-Level: X-Spam-Status: No, score=-1.199 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id eMHyUSoveaZC for ; Thu, 9 Feb 2017 19:00:46 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 0ED325FAC9 for ; Thu, 9 Feb 2017 19:00:46 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 985BEE05B8 for ; Thu, 9 Feb 2017 19:00:44 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id DC2CB21D68 for ; Thu, 9 Feb 2017 19:00:43 +0000 (UTC) Date: Thu, 9 Feb 2017 19:00:43 +0000 (UTC) From: "Michael Braun (JIRA)" To: dev@lucene.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (LUCENE-7682) UnifiedHighlighter not highlighting all terms relevant in SpanNearQuery MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 09 Feb 2017 19:00:50 -0000 [ https://issues.apache.org/jira/browse/LUCENE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860002#comment-15860002 ] Michael Braun commented on LUCENE-7682: --------------------------------------- I think I know why some of this is going on - in NearSpansOrdered stretchToOrder handles figuring out the effective position length it needs to search over and advances each spans to the relevant distance for a match. The second span is advanced just enough so the first instance of 'feed' matches (which satisfies the query), and matchEnd is set to that "feed" occurrence's end position (and matchWidth updated as well), and it stops after that, so NearSpansOrdered effectively does not see that last occurrence of feed when twoPhaseCurrentDocMatches() is called (from getTermToSpans in PhraseHelper). This first end position of the first "feed" occurrence is what's used instead of the last end position within the slop. > UnifiedHighlighter not highlighting all terms relevant in SpanNearQuery > ----------------------------------------------------------------------- > > Key: LUCENE-7682 > URL: https://issues.apache.org/jira/browse/LUCENE-7682 > Project: Lucene - Core > Issue Type: Bug > Components: modules/highlighter > Reporter: Michael Braun > > Original text: "Something for protecting wildlife feed in a feed thing." > Query is: > SpanNearQuery with Slop 9 - in order - > 1. SpanTermQuery(wildlife) > 2. SpanTermQuery(feed) > This should highlight both instances of "feed" since they are both within slop of 9 of "wildlife". However, only the first instance is highlighted. This occurs with unordered SpanNearQuery as well. Test below replicates. Affects both the current 6.x line and master. > Test that fits within TestUnifiedHighlighterMTQ: > {code} > public void testOrderedSpanNearQueryWithDupeTerms() throws Exception { > RandomIndexWriter iw = new RandomIndexWriter(random(), dir, indexAnalyzer); > Document doc = new Document(); > doc.add(new Field("body", "Something for protecting wildlife feed in a feed thing.", fieldType)); > doc.add(newTextField("id", "id", Field.Store.YES)); > iw.addDocument(doc); > IndexReader ir = iw.getReader(); > iw.close(); > IndexSearcher searcher = newSearcher(ir); > UnifiedHighlighter highlighter = new UnifiedHighlighter(searcher, indexAnalyzer); > int docID = searcher.search(new TermQuery(new Term("id", "id")), 1).scoreDocs[0].doc; > SpanTermQuery termOne = new SpanTermQuery(new Term("body", "wildlife")); > SpanTermQuery termTwo = new SpanTermQuery(new Term("body", "feed")); > SpanNearQuery topQuery = new SpanNearQuery.Builder("body", true) > .setSlop(9) > .addClause(termOne) > .addClause(termTwo) > .build(); > int[] docIds = new int[] {docID}; > String snippets[] = highlighter.highlightFields(new String[] {"body"}, topQuery, docIds, new int[] {2}).get("body"); > assertEquals(1, snippets.length); > assertEquals("Something for protecting wildlife feed in a feed thing.", snippets[0]); > ir.close(); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org