lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-7438) UnifiedHighlighter
Date Wed, 07 Sep 2016 15:11:20 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15470875#comment-15470875
] 

ASF GitHub Bot commented on LUCENE-7438:
----------------------------------------

GitHub user Timothy055 opened a pull request:

    https://github.com/apache/lucene-solr/pull/79

    LUCENE-7438 UnifiedHighlighter

    Initial pull request for [LUCENE-7438](https://issues.apache.org/jira/browse/LUCENE-7438)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Timothy055/lucene-solr master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/lucene-solr/pull/79.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #79
    
----
commit 02e932c4a6146363680b88f4947a693c6697c955
Author: Timothy Rodriguez <trodriguez25@bloomberg.net>
Date:   2016-09-01T19:23:50Z

    Initial fork of PostingsHighlighter for UnifiedHighlighter

commit 9d88411b3985a98851384d78d681431dba710e89
Author: Timothy Rodriguez <trodriguez25@bloomberg.net>
Date:   2016-09-01T23:17:06Z

    Initial commit of the UnifiedHighlighter for OSS contribution

commit e45e39bc4b07ea33e4423b264c2fefb9aa08777a
Author: David Smiley <david.w.smiley@gmail.com>
Date:   2016-09-02T12:45:49Z

    Fix misc issues; "ant test" now works. (#1)

commit 046a28ef31acf4cea7d255bbbb4b827e6a714e3d
Author: Timothy Rodriguez <trodriguez25@bloomberg.net>
Date:   2016-09-02T20:58:31Z

    Minor refactoring of the AnalysisFieldHighlighter

commit ccd1a2280abd4b48cfef8122696e5d9cfd12920f
Author: David Smiley <dsmiley@apache.org>
Date:   2016-09-03T12:55:20Z

    AbstractFieldHighlighter: order methods more sensibly; renamed a couple.

commit d4714a04a3e41d5e95bbe942b275c32ed69b9c2e
Author: David Smiley <dsmiley@apache.org>
Date:   2016-09-04T01:03:29Z

    Improve javadocs and @lucene.external/internal labeling & scope.
    "ant precommit" now passes.

commit e0659f18a59bf2893076da6d7643ff30f2fa5a52
Author: David Smiley <dsmiley@apache.org>
Date:   2016-09-04T01:25:55Z

    Analysis: remove dubious filter() method

commit ccd7ce707bff2c06da89b31853cca9aecea72008
Author: David Smiley <dsmiley@apache.org>
Date:   2016-09-04T01:44:01Z

    getStrictPhraseHelper -> rm "Strict", getHighlightAccuracy -> getFlags, and only
call filterExtractedTerms once.

commit ffc2a22c700b8abcbf87673d5d05bb3659d177c9
Author: David Smiley <david.w.smiley@gmail.com>
Date:   2016-09-04T15:21:08Z

    UnifiedHighlighter round 2 (#2)
    
    * AbstractFieldHighlighter: order methods more sensibly; renamed a couple.
    
    * Improve javadocs and @lucene.external/internal labeling & scope.
    "ant precommit" now passes.
    
    * Analysis: remove dubious filter() method
    
    * getStrictPhraseHelper -> rm "Strict", getHighlightAccuracy -> getFlags, and only
call filterExtractedTerms once.

commit 5f95e05595db462d3ab5bffc68c2c92f70875072
Author: David Smiley <dsmiley@apache.org>
Date:   2016-09-04T16:12:33Z

    Refactor: FieldOffsetStrategy

commit 86fb6265fbbdb955ead6d4baf944bf708175715e
Author: David Smiley <dsmiley@apache.org>
Date:   2016-09-04T16:21:32Z

    stop passing maxPassages into highlightFieldForDoc()

commit f6fd80544eae9fab953b94b1e9346c0883f956eb
Author: David Smiley <dsmiley@apache.org>
Date:   2016-09-04T16:12:33Z

    Refactor: FieldOffsetStrategy

commit b335a673c2ce45904890c1e9af7cbfda2bd27b0f
Author: David Smiley <dsmiley@apache.org>
Date:   2016-09-04T16:21:32Z

    stop passing maxPassages into highlightFieldForDoc()

commit 478db9437b92214cbf459f82ba2e3a67c966a150
Author: David Smiley <dsmiley@apache.org>
Date:   2016-09-04T18:29:44Z

    Rename subclasses of FieldOffsetStrategy.

commit dbf4280755c11420a5032445cd618fadb7444b61
Author: David Smiley <dsmiley@apache.org>
Date:   2016-09-04T18:31:34Z

    Re-order and harmonize params on methods called by UH.getFieldHighlighter()

commit f0340e27e61dcda2e11992f08ec07a72fad6c24c
Author: David Smiley <dsmiley@apache.org>
Date:   2016-09-04T18:53:51Z

    FieldHighlighter: harmonize field/param order. And don't apply maxNoHighlightPasses twice.

commit 817f63c1d48fd523c13b9c40a2ae9b8a4047209a
Author: Timothy Rodriguez <trodriguez25@bloomberg.net>
Date:   2016-09-06T20:43:20Z

    Merge of renaming changes

commit 0f644a4f53c1ed4d41d562848f6fe51a87442a75
Author: Timothy Rodriguez <trodriguez25@bloomberg.net>
Date:   2016-09-06T20:54:13Z

    add visibility tests

commit 9171f49e117085e7d086267bb73836831ff07f8e
Author: Timothy Rodriguez <trodriguez25@bloomberg.net>
Date:   2016-09-07T14:26:59Z

    ADd additional extensibility test

----


> UnifiedHighlighter
> ------------------
>
>                 Key: LUCENE-7438
>                 URL: https://issues.apache.org/jira/browse/LUCENE-7438
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/highlighter
>    Affects Versions: 6.2
>            Reporter: Timothy M. Rodriguez
>            Assignee: David Smiley
>
> The UnifiedHighlighter is an evolution of the PostingsHighlighter that is able to highlight
using offsets in either postings, term vectors, or from analysis (a TokenStream). Lucene’s
existing highlighters are mostly demarcated along offset source lines, whereas here it is
unified -- hence this proposed name. In this highlighter, the offset source strategy is separated
from the core highlighting functionalty. The UnifiedHighlighter further improves on the PostingsHighlighter’s
design by supporting accurate phrase highlighting using an approach similar to the standard
highlighter’s WeightedSpanTermExtractor. The next major improvement is a hybrid offset source
strategythat utilizes postings and “light” term vectors (i.e. just the terms) for highlighting
multi-term queries (wildcards) without resorting to analysis. Phrase highlighting and wildcard
highlighting can both be disabled if you’d rather highlight a little faster albeit not as
accurately reflecting the query.
> We’ve benchmarked an earlier version of this highlighter comparing it to the other
highlighters and the results were exciting! It’s tempting to share those results but it’s
definitely due for another benchmark, so we’ll work on that. Performance was the main motivator
for creating the UnifiedHighlighter, as the standard Highlighter (the only one meeting Bloomberg
Law’s accuracy requirements) wasn’t fast enough, even with term vectors along with several
improvements we contributed back, and even after we forked it to highlight in multiple threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message