lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Search both diacritics and non-diacritics
Date Sun, 03 Jan 2010 00:31:01 GMT
The ASCIIFoldingFilter is a superset of the ISOLatin1Filter -
ISOLatin1 is deprecated.  Here's the Javadoc from ASCIIFoldingFIlter.
You did not mention which language you want to search.

Unforch, the ASCIIFoldingFilter is not mentioned on the Solr wiki.

http://www.lucidimagination.com/search/?q=ASCIIFoldingFilter+

http://lucene.apache.org/java/2_9_1/api/all/org/apache/lucene/analysis/ASCIIFoldingFilter.html

org.apache.lucene.analysis.ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode
characters which are not in the first 127 ASCII characters (the "Basic
Latin" Unicode block) into their ASCII equivalents, if one exists.
Characters from the following Unicode blocks are converted; however,
only those characters with reasonable ASCII alternatives are
converted:

C1 Controls and Latin-1 Supplement: http://www.unicode.org/charts/PDF/U0080.pdf
Latin Extended-A: http://www.unicode.org/charts/PDF/U0100.pdf
Latin Extended-B: http://www.unicode.org/charts/PDF/U0180.pdf
Latin Extended Additional: http://www.unicode.org/charts/PDF/U1E00.pdf
Latin Extended-C: http://www.unicode.org/charts/PDF/U2C60.pdf
Latin Extended-D: http://www.unicode.org/charts/PDF/UA720.pdf
IPA Extensions: http://www.unicode.org/charts/PDF/U0250.pdf
Phonetic Extensions: http://www.unicode.org/charts/PDF/U1D00.pdf
Phonetic Extensions Supplement: http://www.unicode.org/charts/PDF/U1D80.pdf
General Punctuation: http://www.unicode.org/charts/PDF/U2000.pdf
Superscripts and Subscripts: http://www.unicode.org/charts/PDF/U2070.pdf
Enclosed Alphanumerics: http://www.unicode.org/charts/PDF/U2460.pdf
Dingbats: http://www.unicode.org/charts/PDF/U2700.pdf
Supplemental Punctuation: http://www.unicode.org/charts/PDF/U2E00.pdf
Alphabetic Presentation Forms: http://www.unicode.org/charts/PDF/UFB00.pdf
Halfwidth and Fullwidth Forms: http://www.unicode.org/charts/PDF/UFF00.pdf
See: http://en.wikipedia.org/wiki/Latin_characters_in_Unicode The set
of character conversions supported by this class is a superset of
those supported by Lucene's ISOLatin1AccentFilter which strips accents
from Latin1 characters. For example, 'à' will be replaced by 'a'.

Mime
View raw message