lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: MappingCharFilterFactory equivalent for use after tokenizer?
Date Fri, 18 Jun 2010 23:56:20 GMT
On Fri, Jun 18, 2010 at 7:11 PM, Lance Norskog <goksron@gmail.com> wrote:

> Indeed. Also, it should be possible to output multiple synonyms based
> on the mapping: word_with_umlaut should be become word_with_u and
> word_with_ue as synonyms. (Ok, maybe this example is wrong, but it
> illustrates the idea.)
>
>
I don't think we should do this. how many tokens would üüüüüüüüüüüü make?
(such malformed input exists in the wild, e.g. someone spills beer on their
keyboard and they key gets sticky)

-- 
Robert Muir
rcmuir@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message