lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <>
Subject [jira] Commented: (SOLR-1410) remove deprecated custom encoding support in russian/greek analysis
Date Wed, 09 Sep 2009 03:59:00 GMT


Hoss Man commented on SOLR-1410:

Committed revision 812760.

thanks robert

> remove deprecated custom encoding support in russian/greek analysis
> -------------------------------------------------------------------
>                 Key: SOLR-1410
>                 URL:
>             Project: Solr
>          Issue Type: Task
>          Components: Analysis
>            Reporter: Robert Muir
>            Assignee: Hoss Man
>            Priority: Minor
>             Fix For: 1.4
>         Attachments: SOLR-1410.patch
> In this case, analyzers have strange encoding support and it has been deprecated in lucene.
> For example someone using CP1251 in the russian analyzer is simply storing Ж as 0xC6,
its being represented as Æ
> LUCENE-1793: Deprecate the custom encoding support in the Greek and Russian
>     Analyzers. If you need to index text in these encodings, please use Java's
>     character set conversion facilities (InputStreamReader, etc) during I/O, 
>     so that Lucene can analyze this text as Unicode instead.
> I noticed in solr, the factories for these tokenstreams allow these configuration options,
which are deprecated in 2.9 to be removed in 3.0
> Let me know the policy (how do you deprecate a config option in solr exactly, log a warning,
etc?) and I'd be happy to create a patch.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message