lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <>
Subject [jira] Commented: (LUCENE-1321) Highlight fragment does not extend to maxDocCharsToAnalyze
Date Mon, 30 Jun 2008 12:35:45 GMT


Mark Miller commented on LUCENE-1321:

Thanks Lars. Nice catch - not an easy spot <g> Looks good to me. When I get a few free
minutes I'll go over it a bit more, but on first inspection, certainly looks like the right
fix and all tests pass.

> Highlight fragment does not extend to maxDocCharsToAnalyze
> ----------------------------------------------------------
>                 Key: LUCENE-1321
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: contrib/highlighter
>    Affects Versions: 2.4
>            Reporter: Lars Kotthoff
>            Assignee: Mark Miller
>            Priority: Minor
>         Attachments: LUCENE-1321.patch
> The current highlighter code checks whether the total length of the text to highlight
is strictly smaller than maxDocCharsToAnalyze before adding any text remaining after the last
token to the fragment. This means that if maxDocCharsToAnalyse is set to exactly the length
of the text and the last token of the text is the term to highlight and is followed by non-token
text, this non-token text will not be highlighted.
> For example, consider the phrase "this is a text with searchterm in it". "In" and "it"
are not tokenized because they're stopwords. Setting maxDocCharsToAnalyze to 36 (the length
of the sentence) and searching for "searchterm" gives a fragment ending in "searchterm". The
expected behaviour is to have "in it" at the end of the fragment, since maxDocCharsToAnalyse
explicitely states that the whole phrase should be considered.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message