lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Ernst (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-5927) 4.9 -> 4.10 change in StandardTokenizer behavior on \u1aa2
Date Mon, 08 Sep 2014 20:12:28 GMT
Ryan Ernst created LUCENE-5927:
----------------------------------

             Summary: 4.9 -> 4.10 change in StandardTokenizer behavior on \u1aa2
                 Key: LUCENE-5927
                 URL: https://issues.apache.org/jira/browse/LUCENE-5927
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Ryan Ernst


In 4.9, this string was broken into 2 tokens by StandardTokenizer:
"\u1aa2\u1a7f\u1a6f\u1a6f\u1a61\u1a72" = "\u1aa2", " \u1a7f\u1a6f\u1a6f\u1a61\u1a72"

However, in 4.10, that has changed so it is now a single token returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message