lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Lynch (JIRA)" <>
Subject [jira] Commented: (LUCENE-1215) Support of Unicode Collation
Date Tue, 11 Mar 2008 05:42:46 GMT


Andrew Lynch commented on LUCENE-1215:

This will be quite useful. I used the Normalizer to implement my own custom analyzer for

There is actually a Normalizer equivalent in older versions of the Sun JDK, sun.text.Normalizer,
but this obviously wouldn't end up being portable across VMs. 

I ended up using reflection to determine the presence of Normalizer if it existed, then fell
back to sun.text.Normalizer, then finally performing no normalization if neither could be
found to preserve compatibility with non Java 6/ Sun JDKs.

> Support of Unicode Collation
> ----------------------------
>                 Key: LUCENE-1215
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Analysis
>            Reporter: Hiroaki Kawai
>         Attachments:
> New in java 6, we have java.text.Normalizer that supports Unicode Standard Annex #15
> The normalization defined has four variants of C, D, KC, KD. Canonical Decomposition
or Compatibility Decomposition will be normalize the representation of a String, and the search
result will be improved.
> I'd like to submit a TokenFilter code supporting this feature! :-)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message