lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "SchemaDesign" by KojiSekiguchi
Date Mon, 05 Jul 2010 14:36:37 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SchemaDesign" page has been changed by KojiSekiguchi.
The comment on this change is: No more CharStreamAwareTokenizers needed.
http://wiki.apache.org/solr/SchemaDesign?action=diff&rev1=10&rev2=11

--------------------------------------------------

  Searching text in different languages is very difficult. The Latin1Accent filters downgrade
all European "special characters" down to their US Ascii equivalents: the French spelling
''protégé'' becomes the English spelling ''protege''. 
  In Solr-1.3, use this in the filter stack of your "text" field type:
  {{{
+ <tokenizer class="solr.WhitespaceTokenizer" />
  <filter class="solr.ISOLatin1AccentFilterFactory" />
  }}}
  In Solr-1.4, use this:
  {{{
  <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
+ <tokenizer class="solr.WhitespaceTokenizer" />
  }}}
  
- At the moment you must also use this tokenizer with solr.MappingCharFilterFactory:
- {{{
- <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>
- }}}
- Otherwise you will get errors (potentially including fatal, uncaught exceptions) when using
the lucene highlighter, etc: 
- 

Mime
View raw message