lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8306) Allow iteration over the term positions of a Match
Date Mon, 14 May 2018 14:36:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16474278#comment-16474278
] 

David Smiley commented on LUCENE-8306:
--------------------------------------

{quote}Could we address this need by calling extract terms on the weight, and filtering the
positions/offsets of these terms to only keep those that intersect with the returned matches?
{quote}
Nice idea but it would be inaccurate, and I think we should aim for accurate results with
this new API.

For example, if the query is "Game of Thrones" near "Show",  then extracting terms is going
to find "of" and other words.  But "of" ought to only be a match when it's in the phrase
"Game of Thrones", not in other places that happen to occur in the larger span near "Show".
 Our highlighters have failed this for a long time but only recently was the UnifiedHighlighter improved
to resolve this by using the SpanCollector API – LUCENE-8121  (for 7.3, yay).

> Allow iteration over the term positions of a Match
> --------------------------------------------------
>
>                 Key: LUCENE-8306
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8306
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8306.patch, LUCENE-8306.patch
>
>
> For multi-term queries such as phrase queries, the matches API currently just returns
information about the span of the whole match.  It would be useful to also expose information
about the matching terms within the phrase.  The same would apply to Spans and Interval queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message