lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avi Rosenschein <arosensch...@gmail.com>
Subject Re: Unicode case folding
Date Mon, 21 Feb 2011 18:20:46 GMT
Excellent. Thanks, Robert!

-- Avi

On Mon, Feb 21, 2011 at 19:24, Robert Muir <rcmuir@gmail.com> wrote:

> On Mon, Feb 21, 2011 at 12:16 PM, Avi Rosenschein
> <arosenschein@gmail.com> wrote:
> > Is there any analyzer that can do full Unicode case folding (for example,
> as
> > described at
> >
> http://www.w3.org/International/wiki/Case_folding#Recommendations_for_Case_Folding
> > )?
>
> Hi, in branch_3x you can use the ICUNormalizer2FilterFactory to do
> this (normalization mode NFKC_CF)
>
>
> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/solr/contrib/analysis-extras/src/java/org/apache/solr/analysis/ICUNormalizer2FilterFactory.java
>
> You can simply use this instead of LowerCaseFilter (just setup your
> solr/lib with the solr-analysis-extras.jar, icu jar, and lucene's
> contrib-icu jar).
>
> > If there isn't an analyzer for this - any suggestions on how to roll my
> own?
> > Should I simply apply String.toUpperCase() followed by .toLowerCase()?
>
> No, I would recommend using the actual full case folding (with
> normalization) instead. This is not the same as uppercase + lowercase.
> For example, it will correctly handle the 3 forms of greek sigma.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message