Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 45134 invoked from network); 30 Dec 2008 14:37:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 Dec 2008 14:37:32 -0000 Received: (qmail 32302 invoked by uid 500); 30 Dec 2008 14:37:25 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 32258 invoked by uid 500); 30 Dec 2008 14:37:25 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 32247 invoked by uid 99); 30 Dec 2008 14:37:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Dec 2008 06:37:25 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gshackles@gmail.com designates 64.233.170.188 as permitted sender) Received: from [64.233.170.188] (HELO rn-out-0910.google.com) (64.233.170.188) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Dec 2008 14:37:18 +0000 Received: by rn-out-0910.google.com with SMTP id j71so3482450rne.4 for ; Tue, 30 Dec 2008 06:36:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type:references; bh=JovdPwgHMa3tt/sw9B8UenuMXnG9ueL1yZNdYK7zxTE=; b=e43ttFS59ynjfcsRtvXe4xGIDamhgtoml1Zkf877FCSVcn4NcVYUboCieC7+JiUFoA JCbPVp8cxXRgvQe5WgySfcFoPN+QS1Ubk9bLmZfVlFao3k7Kw0P4Z9MSTayeQvidb+8I 1IFpi1Ucak0pNkwdTOwJWCxMuSciApZ5C5kik= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:references; b=PyBMeUzTiF7dREpgSA7ZoevxiVSjikAzdJjjx/hTrMqnkkqB1/Pi9Xh0VAnbumYU01 WX5C8fgPtIDKAlpV8HxXFk/BxPUyUvFCwEesXKW6qAA43qklGziC+FsOMi9E6gfphutN zEKWPBueQRDaXbeRSo+AAdw+DGqIgMODWHo74= Received: by 10.100.128.2 with SMTP id a2mr8319812and.158.1230647817974; Tue, 30 Dec 2008 06:36:57 -0800 (PST) Received: by 10.100.10.7 with HTTP; Tue, 30 Dec 2008 06:36:57 -0800 (PST) Message-ID: Date: Tue, 30 Dec 2008 09:36:57 -0500 From: "Greg Shackles" To: java-user@lucene.apache.org Subject: Re: Filtering accents In-Reply-To: <495A2E9C.8040900@informatics.jax.org> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_101700_15559224.1230647817666" References: <547401.91928.qm@web26503.mail.ukl.yahoo.com> <495A2E9C.8040900@informatics.jax.org> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_101700_15559224.1230647817666 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Just thought I'd comment since I had to do word processing before indexing in my application as well. Matt's method is pretty similar to what I did. I wrote a filter that transforms the tokens as they get indexed (and also use that for searching). Since I am indexing a block of words, rather than one document per word, I store the word in its original form in the payload of the token so I can retrieve it from the search. If your documents contain one word each, then Matt's suggestion of a stored field would be the right way to go. If not, I would suggest using payloads. - Greg ------=_Part_101700_15559224.1230647817666--