lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shawn Heisey (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-7960) NGram filters -- add option to keep short terms
Date Wed, 06 Sep 2017 18:26:00 GMT
Shawn Heisey created LUCENE-7960:
------------------------------------

             Summary: NGram filters -- add option to keep short terms
                 Key: LUCENE-7960
                 URL: https://issues.apache.org/jira/browse/LUCENE-7960
             Project: Lucene - Core
          Issue Type: Improvement
          Components: modules/analysis
            Reporter: Shawn Heisey


When ngram or edgengram filters are used, any terms that are shorter than the minGramSize
are completely removed from the token stream.

This is probably 100% what was intended, but I've seen it cause a lot of problems for users.
 I am not suggesting that the default behavior be changed.  That would be far too disruptive
to the existing user base.

I do think there should be a new boolean option, with a name like keepShortTerms, that defaults
to false, to allow the short terms to be preserved.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message