lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7438) UnifiedHighlighter
Date Tue, 04 Oct 2016 20:12:21 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546521#comment-15546521
] 

ASF subversion and git services commented on LUCENE-7438:
---------------------------------------------------------

Commit 722e82712435ecf46c9868137d885484152f749b in lucene-solr's branch refs/heads/master
from [~dsmiley]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=722e827 ]

LUCENE-7438: New UnifiedHighlighter


> UnifiedHighlighter
> ------------------
>
>                 Key: LUCENE-7438
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7438
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>    Affects Versions: 6.2
>            Reporter: Timothy M. Rodriguez
>            Assignee: David Smiley
>         Attachments: LUCENE-7438.patch, LUCENE_7438_UH_benchmark.patch, LUCENE_7438_UH_small_changes.patch
>
>
> The UnifiedHighlighter is an evolution of the PostingsHighlighter that is able to highlight
using offsets in either postings, term vectors, or from analysis (a TokenStream). Lucene’s
existing highlighters are mostly demarcated along offset source lines, whereas here it is
unified -- hence this proposed name. In this highlighter, the offset source strategy is separated
from the core highlighting functionalty. The UnifiedHighlighter further improves on the PostingsHighlighter’s
design by supporting accurate phrase highlighting using an approach similar to the standard
highlighter’s WeightedSpanTermExtractor. The next major improvement is a hybrid offset source
strategythat utilizes postings and “light” term vectors (i.e. just the terms) for highlighting
multi-term queries (wildcards) without resorting to analysis. Phrase highlighting and wildcard
highlighting can both be disabled if you’d rather highlight a little faster albeit not as
accurately reflecting the query.
> We’ve benchmarked an earlier version of this highlighter comparing it to the other
highlighters and the results were exciting! It’s tempting to share those results but it’s
definitely due for another benchmark, so we’ll work on that. Performance was the main motivator
for creating the UnifiedHighlighter, as the standard Highlighter (the only one meeting Bloomberg
Law’s accuracy requirements) wasn’t fast enough, even with term vectors along with several
improvements we contributed back, and even after we forked it to highlight in multiple threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message