lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1545) Standard analyzer does not correctly tokenize combining character U+0364 COMBINING LATIN SMALL LETTRE E
Date Thu, 11 Jun 2009 05:21:08 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718300#action_12718300
] 

Robert Muir commented on LUCENE-1545:
-------------------------------------

if you are looking for a more short-term solution (since i think 1488 will take quite a bit
more time), it would be possible to make StandardAnalyzer more 'unicode-friendly'.

its not possible to make it 'correct', and adding additional unicode friendliness would make
backwards compat a much more complex issue (different unicode versions across JVM  versions,
etc).

but if you want, i'm willing to come up with some minor grammar changes for StandardAnalyzer
that could help things like this.


> Standard analyzer does not correctly tokenize combining character U+0364 COMBINING LATIN
SMALL LETTRE E
> -------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1545
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1545
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.4
>         Environment: Linux x86_64, Sun Java 1.6
>            Reporter: Andreas Hauser
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: AnalyzerTest.java
>
>
> Standard analyzer does not correctly tokenize combining character U+0364 COMBINING LATIN
SMALL LETTRE E.
> The word "moͤchte" is incorrectly tokenized into "mo" "chte", the combining character
is lost.
> Expected result is only on token "moͤchte".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message