lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Woodward (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8401) Add PassageBuilder to help construct highlights using MatchesIterator
Date Mon, 16 Jul 2018 10:27:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545040#comment-16545040
] 

Alan Woodward commented on LUCENE-8401:
---------------------------------------

Here is a patch containing the following classes:
 * Passage -> a representation of a highlight snippet, with text, start and end offset,
and a list of internal hits
 * PassageBreaker -> an interface that defines where passages should start and end, and
if a hit should be added to the current passage or should start a new one
 * PassageBuilder -> public API that iteratively returns new Passage objects
 * BreakIteratorPassageBreaker -> simple implementation of PassageBreaker

The passage builder uses a wrapper around its MatchesIterator to enable it to peek at the
position of the next hit, which means that we can improve clustering for hits that looks like
[A .............. B .. C], where A and B are within the maximum snippet size, but A and C
are not.

> Add PassageBuilder to help construct highlights using MatchesIterator
> ---------------------------------------------------------------------
>
>                 Key: LUCENE-8401
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8401
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/highlighter
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8401.patch
>
>
> Jim and I discussed a while back the idea of adding highlighter components, rather than
a fully-fledged highlighter, which would allow users to build their own specialised highlighters.  To
that end, I'd like to add a PassageBuilder class that uses the Matches API to break text up
into passages containing hits.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message