lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <>
Subject [jira] Commented: (SOLR-1410) remove deprecated custom encoding support in russian/greek analysis
Date Thu, 03 Sep 2009 21:36:57 GMT


Hoss Man commented on SOLR-1410:

I don't think we've ever really had a situation like this ...logging a warning seems like
the right course of action for now ... then once the functionality is removed, we can change
the factory to fail on init if it sees the option is still set in the schema.xml

> remove deprecated custom encoding support in russian/greek analysis
> -------------------------------------------------------------------
>                 Key: SOLR-1410
>                 URL:
>             Project: Solr
>          Issue Type: Task
>          Components: Analysis
>            Reporter: Robert Muir
>            Priority: Minor
> In this case, analyzers have strange encoding support and it has been deprecated in lucene.
> For example someone using CP1251 in the russian analyzer is simply storing Ж as 0xC6,
its being represented as Æ
> LUCENE-1793: Deprecate the custom encoding support in the Greek and Russian
>     Analyzers. If you need to index text in these encodings, please use Java's
>     character set conversion facilities (InputStreamReader, etc) during I/O, 
>     so that Lucene can analyze this text as Unicode instead.
> I noticed in solr, the factories for these tokenstreams allow these configuration options,
which are deprecated in 2.9 to be removed in 3.0
> Let me know the policy (how do you deprecate a config option in solr exactly, log a warning,
etc?) and I'd be happy to create a patch.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message