lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Timothy M. Rodriguez (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8145) UnifiedHighlighter should use single OffsetEnum rather than List<OffsetEnum>
Date Thu, 01 Feb 2018 00:40:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16347823#comment-16347823
] 

Timothy M. Rodriguez commented on LUCENE-8145:
----------------------------------------------

Thanks for the CC [~dsmiley].

[~romseygeek] really nice change!  Definitely simplifies things quite a bit and conceptually
one meta OffsetEnum over the field makes more sense than the list from previous.

I'm in favor of keeping the summed frequency on MTQ or at least preserving a mechanism to
keep it on.  The extra occurrences may not always seem spurious in all cases.  For example,
consider "expert" systems where users are accustomed to using wildcards for stemming-like
expressions.  E.g. purchas* for getting variants of the word purchase.  In those cases,
the extra frequency counts would hopefully select a better passage.



I'm not so sure about setScore being passed in a scorer and content length to set the score
though. That feels awkward to me.  If we were to keep it this way, I'd argue a Passage should
receive the PassageScorer and content length at construction instead of via the setScore method. 
If we did that, I think we could incrementally build the score instead of tracking terms and
frequencies for a later score calculation?  Another choice is to move a lot of scoring behavior
and perhaps introduce another class that's tracking the terms and score in a passage analagous
to Weight?

 

 

> UnifiedHighlighter should use single OffsetEnum rather than List<OffsetEnum>
> ----------------------------------------------------------------------------
>
>                 Key: LUCENE-8145
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8145
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Minor
>         Attachments: LUCENE-8145.patch
>
>
> The UnifiedHighlighter deals with several different aspects of highlighting: finding
highlight offsets, breaking content up into snippets, and passage scoring.  It would be nice
to split this up so that consumers can use them separately.
> As a first step, I'd like to change the API of FieldOffsetStrategy to return a single
unified OffsetsEnum, rather than a collection of them.  This will make it easier to expose
the OffsetsEnum of a document directly from the highlighter, bypassing snippet extraction
and scoring.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message