lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Willnauer <simon.willna...@googlemail.com>
Subject Re: [jira] Commented: (LUCENE-1689) supplementary character handling
Date Mon, 16 Nov 2009 15:48:37 GMT
+1

On 11/16/09, Robert Muir (JIRA) <jira@apache.org> wrote:
>
>     [
> https://issues.apache.org/jira/browse/LUCENE-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778378#action_12778378
> ]
>
> Robert Muir commented on LUCENE-1689:
> -------------------------------------
>
> a couple people have asked me about this issue lately, I would prefer to
> spin off smaller issues rather than create large patches that become out of
> date.
>
> also I think Simon is interested in working on some of this, so more jira
> spam but i think easier to make progress.
>
>
>> supplementary character handling
>> --------------------------------
>>
>>                 Key: LUCENE-1689
>>                 URL: https://issues.apache.org/jira/browse/LUCENE-1689
>>             Project: Lucene - Java
>>          Issue Type: Improvement
>>            Reporter: Robert Muir
>>            Priority: Minor
>>             Fix For: 3.1
>>
>>         Attachments: LUCENE-1689.patch, LUCENE-1689.patch,
>> LUCENE-1689.patch, LUCENE-1689_lowercase_example.txt,
>> testCurrentBehavior.txt
>>
>>
>> for Java 5. Java 5 is based on unicode 4, which means variable-width
>> encoding.
>> supplementary character support should be fixed for code that works with
>> char/char[]
>> For example:
>> StandardAnalyzer, SimpleAnalyzer, StopAnalyzer, etc should at least be
>> changed so they don't actually remove suppl characters, or modified to
>> look for surrogates and behave correctly.
>> LowercaseFilter should be modified to lowercase suppl. characters
>> correctly.
>> CharTokenizer should either be deprecated or changed so that isTokenChar()
>> and normalize() use int.
>> in all of these cases code should remain optimized for the BMP case, and
>> suppl characters should be the exception, but still work.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message