lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (@MITRE.org)" <DSMI...@mitre.org>
Subject RE: Should ASCIIFoldingFilter be deprecated?
Date Tue, 08 Feb 2011 14:12:05 GMT


Chris Hostetter-3 wrote:
> 
> CharFilters and TokenFilters have different purposes though...
> 
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#When_To_use_a_CharFilter_vs_a_TokenFilter
> 
> (ie: If you use MappingCharFilter, you can't then tokenize on some of the 
> characters you filtered away)
> 

Right, but it’s hard to imagine wanting to tokenize on an accent character
or some other modification specified in these particular mapping files.


Steven A Rowe wrote:
> 
> AFAIK, ISOLatin1AccentFilter was deprecated because ASCIIFoldingFilter
> provides a superset of it mappings.
> 

*If* that is the case then this file should also be removed:
solr/example/solr/conf/mapping-ISOLatin1Accent.txt


Steven A Rowe wrote:
> 
> I haven't done any benchmarking, but I'm pretty sure that
> ASCIIFoldingFilter can achieve a significantly higher throughput rate than
> MappingCharFilter, and given that, it probably makes sense to keep both,
> to allow people to make the choice about the tradeoff between the
> flexibility provided by the human-readable (and editable) mapping file and
> the speed provided by ASCIIFoldingFilter.
> 

I'm skeptical that whatever the difference is is relevant in the scheme of
things. The cost to keeping it is introducing confusion on users, and more
code to maintain.

~ David Smiley

-----
 Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Should-ASCIIFoldingFilter-be-deprecated-tp2448919p2451504.html
Sent from the Solr - Dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message