lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Solr Wiki] Update of "UnicodeCollation" by RobertMuir
Date Thu, 03 Mar 2011 03:23:03 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "UnicodeCollation" page has been changed by RobertMuir.
The comment on this change is: add an example for ICU collation.


  = Unicode Collation =
- <!> [[Solr1.5]]
+ <!> [[Solr3.1]]
  == Overview ==
  [[|Unicode Collation]] is a method
to sort text in a language-sensitive way. It is primarily intended for sorting, but can also
be used for advanced search purposes.
@@ -144, +144 @@

  Please note that the strange output you see from the filter is really a binary collation
key encoded in a special form. What is important is that it is the same value for equivalent
tokens as defined by that collator.
+ == ICU Collation ==
+ For better performance, less memory usage, and support for more locales, you can add the
analysis-extras contrib and use ICUCollationKeyFilterFactory instead. See the [[|javadocs]]
for more information.
+ In general, the principles are the same, you just specify an RFC3066 language identifier
with the locale parameter instead of specifying language+country+variant.
+ For example, to get German phonebook sort order:
+ {{{
+ <fieldType name="collatedICU" class="solr.TextField">
+   <analyzer>
+     <tokenizer class="solr.KeywordTokenizerFactory"/>
+     <filter class="solr.ICUCollationKeyFilterFactory"
+         locale="de@collation=phonebook"
+         strength="primary"
+     />
+   </analyzer>
+ </fieldType>
+ }}}
+ To use this filter, see solr/contrib/analysis-extras/README.txt for instructions on which
jars you need to add to your SOLR_HOME/lib

View raw message