Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 44577 invoked from network); 22 Nov 2010 19:58:26 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Nov 2010 19:58:26 -0000 Received: (qmail 55060 invoked by uid 500); 22 Nov 2010 19:58:57 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 55022 invoked by uid 500); 22 Nov 2010 19:58:57 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 55014 invoked by uid 99); 22 Nov 2010 19:58:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Nov 2010 19:58:56 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of hossman_lucene@fucit.org designates 208.69.42.181 as permitted sender) Received: from [208.69.42.181] (HELO radix.cryptio.net) (208.69.42.181) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Nov 2010 19:58:49 +0000 Received: by radix.cryptio.net (Postfix, from userid 1007) id C0C5A71C140; Mon, 22 Nov 2010 11:58:27 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by radix.cryptio.net (Postfix) with ESMTP id BE9B371C0FC for ; Mon, 22 Nov 2010 11:58:27 -0800 (PST) Date: Mon, 22 Nov 2010 11:58:27 -0800 (PST) From: Chris Hostetter To: general@lucene.apache.org Subject: =?ISO-8859-15?Q?Re=3A_special_characters_=22=F8=22_indexing=2Fsearching?= In-Reply-To: Message-ID: References: <1289829779397-1904492.post@n3.nabble.com> <1289920557074-1911347.post@n3.nabble.com> User-Agent: Alpine 1.10 (DEB 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org : I disagree with Hoss on this issue, removing diacritics in a filter is : not going to "mess up highlighting". The offsets are set by the : tokenizer. So its no different than stemming or any other process. thanks for correcting me dude ... i'm not sure what i wsa thinkg of, but for some reason i thought there was an issue with the highlighter and token filters that changed the lengths of tokens (including stemming). : The *only* situation where you should use a CharFilter, is when you : must change this stuff before the tokenizer. Can you elaborate on that, because it's definitely something that i'm getting more and more confused by, so i'm sure other people are confused as well. what is an example of a situation where you "must" change stuff before the tokenizer? the HTML Stripper is the one example i understand, but the purpose of hte mapping char filter no longer make sense to me in light of this thread. -Hoss