lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-6692) hl.maxAnalyzedChars should apply cumulatively on a multi-valued field
Date Mon, 13 Apr 2015 16:30:12 GMT

    [ https://issues.apache.org/jira/browse/SOLR-6692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492595#comment-14492595
] 

ASF subversion and git services commented on SOLR-6692:
-------------------------------------------------------

Commit 1673237 from [~dsmiley] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1673237 ]

SOLR-6692: When using hl.maxMultiValuedToMatch with hl.preserveMulti, only count matched snippets.

> hl.maxAnalyzedChars should apply cumulatively on a multi-valued field
> ---------------------------------------------------------------------
>
>                 Key: SOLR-6692
>                 URL: https://issues.apache.org/jira/browse/SOLR-6692
>             Project: Solr
>          Issue Type: Improvement
>          Components: highlighter
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 5.2
>
>         Attachments: SOLR-6692_hl_maxAnalyzedChars_cumulative_multiValued,_and_more.patch
>
>
> in DefaultSolrHighlighter, the hl.maxAnalyzedChars figure is used to constrain how much
text is analyzed before the highlighter stops, in the interests of performance.  For a multi-valued
field, it effectively treats each value anew, no matter how much text it was previously analyzed
for other values for the same field for the current document. The PostingsHighlighter doesn't
work this way -- hl.maxAnalyzedChars is effectively the total budget for a field for a document,
no matter how many values there might be.  It's not reset for each value.  I think this makes
more sense.  When we loop over the values, we should subtract from hl.maxAnalyzedChars the
length of the value just checked.  The motivation here is consistency with PostingsHighlighter,
and to allow for hl.maxAnalyzedChars to be pushed down to term vector uninversion, which wouldn't
be possible for multi-valued fields based on the current way this parameter is used.
> Interestingly, I noticed Solr's use of FastVectorHighlighter doesn't honor hl.maxAnalyzedChars
as the FVH doesn't have a knob for that.  It does have hl.phraseLimit which is a limit that
could be used for a similar purpose, albeit applied differently.
> Furthermore, DefaultSolrHighligher.doHighlightingByHighlighter should exit early from
it's field value loop if it reaches hl.snippets, and if hl.preserveMulti=true



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message