lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Mason (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-3979) NGramTokenizer
Date Fri, 13 Apr 2012 03:43:38 GMT
NGramTokenizer
--------------

                 Key: LUCENE-3979
                 URL: https://issues.apache.org/jira/browse/LUCENE-3979
             Project: Lucene - Java
          Issue Type: Bug
          Components: modules/analysis
    Affects Versions: 3.0, 2.9.2
         Environment: n/a
            Reporter: David Mason
            Priority: Minor


org.apache.lucene.analysis.ngram.NGramTokenizer removes whitespace, making a search for literal
strings like " test" and "test " equivalent to "test". Searching with relevant whitespace
is sometimes desired, particularly where ngrams are used.

This could be fixed by either removing .trim() from the line shown below, or by providing
a flag to specifically set trimming behaviour (keeping trim=true as the default so that existing
code using this analyzer is not broken).

111: inStr = new String(chars).trim();  // remove any trailing empty strings 


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message