lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (" <>
Subject RE: Should ASCIIFoldingFilter be deprecated?
Date Tue, 08 Feb 2011 14:12:05 GMT

Chris Hostetter-3 wrote:
> CharFilters and TokenFilters have different purposes though...
> (ie: If you use MappingCharFilter, you can't then tokenize on some of the 
> characters you filtered away)

Right, but it’s hard to imagine wanting to tokenize on an accent character
or some other modification specified in these particular mapping files.

Steven A Rowe wrote:
> AFAIK, ISOLatin1AccentFilter was deprecated because ASCIIFoldingFilter
> provides a superset of it mappings.

*If* that is the case then this file should also be removed:

Steven A Rowe wrote:
> I haven't done any benchmarking, but I'm pretty sure that
> ASCIIFoldingFilter can achieve a significantly higher throughput rate than
> MappingCharFilter, and given that, it probably makes sense to keep both,
> to allow people to make the choice about the tradeoff between the
> flexibility provided by the human-readable (and editable) mapping file and
> the speed provided by ASCIIFoldingFilter.

I'm skeptical that whatever the difference is is relevant in the scheme of
things. The cost to keeping it is introducing confusion on users, and more
code to maintain.

~ David Smiley

View this message in context:
Sent from the Solr - Dev mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message