lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-11516) Unified highlighter with word separator never gives context to the left
Date Fri, 20 Oct 2017 16:34:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16212850#comment-16212850
] 

David Smiley commented on SOLR-11516:
-------------------------------------

Hi,
For the UnifiedHighlighter, the interpretation of hl.bs.type is essentially a pluggable way
to establish a snippet boundary.  The value of WORD and CHARACTER are technically supported
but probably make no sense. The default is SENTENCE.

Note that the FastVectorHighlighter uses the same parameter name and values but with a different
semantic meaning -- and in its meaning, WORD is what you'd likely want it at, and it's the
default for that highlighter.

When you use the UH with the default hl.bs.type, what snippeting challenges do you face?

hl.fragsize is supported but it's fidelity is to the hl.bs.type unit -- generally a sentence
boundary.  With the original Highlighter, it was to the word edge, which meant it very likely
chopped off a sentence, which isn't great.

> Unified highlighter with word separator never gives context to the left
> -----------------------------------------------------------------------
>
>                 Key: SOLR-11516
>                 URL: https://issues.apache.org/jira/browse/SOLR-11516
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: highlighter
>    Affects Versions: 6.4, 7.1
>            Reporter: Tim Retout
>
> When using the unified highlighter with hl.bs.type=WORD, I am not able to get context
to the left of the matches returned; only words to the right of each match are shown.  I see
this behaviour on both Solr 6.4 and Solr 7.1.
> Without context to the left of a match, the highlighted snippets are much less useful
for understanding where the match appears in a document.
> As an example, using the techproducts data with Solr 7.1, given a search for "apple",
highlighting the "features" field:
> http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.bs.type=WORD&hl.fragsize=30&hl.method=unified
> I see this snippet:
> "<em>Apple</em> Lossless, H.264 video"
> Note that "Apple" is anchored to the left.  Compare with the original highlighter:
> http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.fragsize=30
> And the match has context either side:
> ", Audible, <em>Apple</em> Lossless, H.264 video"
> (To complicate this, in general I am not sure that the unified highlighter is respecting
the hl.fragsize parameter, although [SOLR-9935] suggests support was added.  I included the
hl.fragsize param in the unified URL too, but it's making no difference unless set to 0.)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message