lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Esther Quansah (JIRA)" <>
Subject [jira] [Commented] (SOLR-8212) Standard Highlighter Inconsistent with NGram Tokenizer
Date Thu, 12 Nov 2015 17:03:10 GMT


Esther Quansah commented on SOLR-8212:

Update: problem identified: in,  private static final int MAX_NUM_TOKENS_PER_GROUP
= 50. Terms with query contained farther in word (bronchos*co*py, blood *ca*ncer, etc) end
up having 50+ tokens and therefore private int matchStartOffset and private int matchEndOffset
are not calculated correctly in void addToken() and entire term eventually returned with no

> Standard Highlighter Inconsistent with NGram Tokenizer
> ------------------------------------------------------
>                 Key: SOLR-8212
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Esther Quansah
>            Priority: Minor
>         Attachments: SOLR-8212.patch
> Noticing some inconsistent behavior with the Standard Highlighter and its function on
terms that use the NGram Tokenizer. Ex: 
> I created a field called "title_contains" which uses the NGram Tokenizer and I indexed
the term "bronchoscopy". Querying "co" on the title_contains field should return "bronchos<em>co</em>py",
but the Standard highlighter returns "bronchoscopy" without the highlighting information.
> I created a test called testNgram() which tests the above example using (1) the Standard
Highlighter on the ngram field type and (2) the Fast Vector Highlighter on the ngram field
type. The first fails and the second passes. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message