lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@cs.put.poznan.pl>
Subject Re: [jira] Updated: (LUCENE-1029) Illegal character replacements in ISOLatin1AccentFilter
Date Wed, 17 Oct 2007 06:35:00 GMT

This gets even more complicated when you throw Polish in. We do have diacritics 
(such as ó, ż, ź or ą)

http://www.fileformat.info/info/unicode/char/0105/index.htm

but we _also_ have things like "ł" (l with a stroke):

http://www.fileformat.info/info/unicode/char/0142/index.htm

I don't think the stroke in "ł" would qualify as a diacritic mark... to me it's 
more like a different letter.

Anyway, most Poles are _very_ comfortable with writing e-mails and querying 
search engines with stripped diacritics (and the letter ł replaced by l) even if 
this often leads to change of meaning of the original word. I guess it is so 
because typing diacritics slows you down a bit. Pragmatism.

Dawid


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message