lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Heisey (JIRA)" <>
Subject [jira] [Created] (LUCENE-7960) NGram filters -- add option to keep short terms
Date Wed, 06 Sep 2017 18:26:00 GMT
Shawn Heisey created LUCENE-7960:

             Summary: NGram filters -- add option to keep short terms
                 Key: LUCENE-7960
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
            Reporter: Shawn Heisey

When ngram or edgengram filters are used, any terms that are shorter than the minGramSize
are completely removed from the token stream.

This is probably 100% what was intended, but I've seen it cause a lot of problems for users.
 I am not suggesting that the default behavior be changed.  That would be far too disruptive
to the existing user base.

I do think there should be a new boolean option, with a name like keepShortTerms, that defaults
to false, to allow the short terms to be preserved.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message