lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: special characters "ø" indexing/searching
Date Mon, 22 Nov 2010 19:58:27 GMT

: I disagree with Hoss on this issue, removing diacritics in a filter is
: not going to "mess up highlighting". The offsets are set by the
: tokenizer. So its no different than stemming or any other process.

thanks for correcting me dude ... i'm not sure what i wsa thinkg of, but 
for some reason i thought there was an issue with the highlighter and 
token filters that changed the lengths of tokens (including stemming).

: The *only* situation where you should use a CharFilter, is when you
: must change this stuff before the tokenizer.

Can you elaborate on that, because it's definitely something that i'm 
getting more and more confused by, so i'm sure other people are confused 
as well.

what is an example of a situation where you "must" change stuff before the 
tokenizer?  the HTML Stripper is the one example i understand, but the 
purpose of hte mapping char filter no longer make sense to me in light of 
this thread.


-Hoss

Mime
View raw message