lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Braun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7682) UnifiedHighlighter not highlighting all terms relevant in SpanNearQuery
Date Thu, 09 Feb 2017 19:00:43 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15860002#comment-15860002
] 

Michael Braun commented on LUCENE-7682:
---------------------------------------

I think I know why some of this is going on - in NearSpansOrdered stretchToOrder handles figuring
out the effective position length it needs to search over and advances each spans to the relevant
distance for a match. The second span is advanced just enough so the first instance of 'feed'
matches (which satisfies the query), and matchEnd is set to that "feed" occurrence's end position
(and matchWidth updated as well), and it stops after that, so NearSpansOrdered effectively
does not see that last occurrence of feed when twoPhaseCurrentDocMatches() is called (from
getTermToSpans in PhraseHelper).  This first end position of the first "feed" occurrence is
what's used instead of the last end position within the slop.

> UnifiedHighlighter not highlighting all terms relevant in SpanNearQuery
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-7682
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7682
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/highlighter
>            Reporter: Michael Braun
>
> Original text: "Something for protecting wildlife feed in a feed thing."
> Query is:
>    SpanNearQuery with Slop 9 - in order - 
>       1. SpanTermQuery(wildlife)
>       2. SpanTermQuery(feed)
> This should highlight both instances of "feed" since they are both within slop of 9 of
"wildlife". However, only the first instance is highlighted. This occurs with unordered SpanNearQuery
as well.  Test below replicates. Affects both the current 6.x line and master.
> Test that fits within TestUnifiedHighlighterMTQ:
> {code}
>   public void testOrderedSpanNearQueryWithDupeTerms() throws Exception {
>     RandomIndexWriter iw = new RandomIndexWriter(random(), dir, indexAnalyzer);
>     Document doc = new Document();
>     doc.add(new Field("body", "Something for protecting wildlife feed in a feed thing.",
fieldType));
>     doc.add(newTextField("id", "id", Field.Store.YES));
>     iw.addDocument(doc);
>     IndexReader ir = iw.getReader();
>     iw.close();
>     IndexSearcher searcher = newSearcher(ir);
>     UnifiedHighlighter highlighter = new UnifiedHighlighter(searcher, indexAnalyzer);
>     int docID = searcher.search(new TermQuery(new Term("id", "id")), 1).scoreDocs[0].doc;
>     SpanTermQuery termOne = new SpanTermQuery(new Term("body", "wildlife"));
>     SpanTermQuery termTwo = new SpanTermQuery(new Term("body", "feed"));
>     SpanNearQuery topQuery = new SpanNearQuery.Builder("body", true)
>         .setSlop(9)
>         .addClause(termOne)
>         .addClause(termTwo)
>         .build();
>     int[] docIds = new int[] {docID};
>     String snippets[] = highlighter.highlightFields(new String[] {"body"}, topQuery,
docIds, new int[] {2}).get("body");
>     assertEquals(1, snippets.length);
>     assertEquals("Something for protecting <b>wildlife</b> <b>feed</b>
in a <b>feed</b> thing.", snippets[0]);
>     ir.close();
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message