lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "UnicodeCollation" by OtisGospodnetic
Date Fri, 04 Dec 2009 02:42:29 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "UnicodeCollation" page has been changed by OtisGospodnetic.
The comment on this change is: Clarification FIXME for.... Robert Muir?.
http://wiki.apache.org/solr/UnicodeCollation?action=diff&rev1=1&rev2=2

--------------------------------------------------

  == Sorting text for multiple languages ==
  There are two approaches to supporting multiple languages:
  
-  * If there is a small list, consider defining collated fields for each language and using
copyField.
+  * If there is a small list (FIXME: small list of Languages? Fields?), consider defining
collated fields for each language and using copyField.
   * If there is a very large list, an alternative is to use the "Unicode default" collator.
  
  The Unicode default, or "ROOT" Locale, has rules that are designed to work well in general
for most languages. To use it, simply define the language as the empty string.
@@ -70, +70 @@

  The example code below shows how to create a custom ruleset and dump it to a file.
  
  {{{
-     // get the default rules for germany
+     // get the default rules for Germany
      // these are called DIN 5007-1 sorting
      RuleBasedCollator baseCollator = (RuleBasedCollator) Collator.getInstance(new Locale("de",
"DE"));
  
@@ -116, +116 @@

    </analyzer>
  </fieldType>
  }}}
- 
  Below is an example of what this would look like for two words that should match with this
collator: Töne and toene.
  
  '''org.apache.solr.analysis.StandardTokenizerFactory'''
@@ -127, +126 @@

  ||<style="text-align: center;" |1>payload ||<class="debugdata"> ||<class="debugdata">
||
  
  
+ 
+ 
  '''org.apache.solr.analysis.CollationKeyFilterFactory   {strength=primary, custom=customRules.dat}'''
  ||<tablewidth="" tableclass="analysis"style="text-align: center;" |1>term position
||<class="debugdata">1 ||<class="debugdata">2 ||
  ||<style="text-align: center;" |1>term text ||<class="debugdata">3䀘䀋#6;ࠂ怀#0;#0;#0;
||<class="debugdata">3䀘䀋#6;ࠂ怀#0;#0;#0; ||
@@ -134, +135 @@

  ||<style="text-align: center;" |1>source start,end ||<class="debugdata">0,4
||<class="debugdata">5,10 ||
  ||<style="text-align: center;" |1>payload ||<class="debugdata"> ||<class="debugdata">
||
  
- Please note that the strange output you see from the filter is really a binary collation
key encoded in a special form.
- What is important is that it is the same value for equivalent tokens as defined by that
collator.
  
+ 
+ 
+ Please note that the strange output you see from the filter is really a binary collation
key encoded in a special form. What is important is that it is the same value for equivalent
tokens as defined by that collator.
+ 

Mime
View raw message