lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Miller (JIRA)" <>
Subject [jira] Commented: (LUCENE-1689) supplementary character handling
Date Mon, 16 Nov 2009 20:44:39 GMT


Mark Miller commented on LUCENE-1689:

If there is nothing we can do here, then we just have to do the best we can -

such as a prominent notice alerting that if you transition JVM's between building and searching
the index and you are using or doing X, things will break.

We should put this in a spot that is always pretty visible - perhaps even a new readme file
titlted something like IndexBackwardCompatibility or something, to which we can add other
tips and gotchyas as they come up. Or MaintainingIndicesAcrossVersions, or FancyWhateverGetsYourAttentionAboutUpgradingStuff.
Or a permanent entry/sticky entry at the top of Changes.

> supplementary character handling
> --------------------------------
>                 Key: LUCENE-1689
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Robert Muir
>            Priority: Minor
>             Fix For: 3.1
>         Attachments: LUCENE-1689.patch, LUCENE-1689.patch, LUCENE-1689.patch, LUCENE-1689_lowercase_example.txt,
> for Java 5. Java 5 is based on unicode 4, which means variable-width encoding.
> supplementary character support should be fixed for code that works with char/char[]
> For example:
> StandardAnalyzer, SimpleAnalyzer, StopAnalyzer, etc should at least be changed so they
don't actually remove suppl characters, or modified to look for surrogates and behave correctly.
> LowercaseFilter should be modified to lowercase suppl. characters correctly.
> CharTokenizer should either be deprecated or changed so that isTokenChar() and normalize()
use int.
> in all of these cases code should remain optimized for the BMP case, and suppl characters
should be the exception, but still work.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message