Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 55895 invoked from network); 28 Feb 2005 15:02:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 28 Feb 2005 15:02:07 -0000 Received: (qmail 59644 invoked by uid 500); 28 Feb 2005 15:01:48 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 59587 invoked by uid 500); 28 Feb 2005 15:01:48 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 59520 invoked by uid 99); 28 Feb 2005 15:01:47 -0000 X-ASF-Spam-Status: No, hits=0.3 required=10.0 tests=FORGED_RCVD_HELO,HTML_40_50,HTML_MESSAGE,NO_REAL_NAME X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from eulnweb1.applicable.co.uk (HELO MAILGW16.sonymusic.com) (193.108.150.28) by apache.org (qpsmtpd/0.28) with ESMTP; Mon, 28 Feb 2005 07:01:45 -0800 To: lucene-dev@jakarta.apache.org Subject: special character with lucene MIME-Version: 1.0 X-Mailer: Lotus Notes Release 6.5.1 January 21, 2004 Message-ID: From: Philipp_Breuss@sonydadc.com Date: Mon, 28 Feb 2005 16:01:05 +0100 X-MIMETrack: Serialize by Router on EULNWEB1/External/Servers-EU/SONY(Release 5.0.11 |July 24, 2002) at 28/02/2005 15:01:45, Serialize complete at 28/02/2005 15:01:45 Content-Type: multipart/alternative; boundary="=_alternative 00526628C1256FB6_=" X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N --=_alternative 00526628C1256FB6_= Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable Hello, I would like to build a search engine using several different languages -=20 f.e. Spanish names, French names, ... - Using a different analyzer for each language would be one solution. - But how about replacing each special character (Umlaute, ...=E4, =F6, ...= )=20 with its html special character before indexing and doing the same with=20 each search query before searching?? This seems to me the simplest approach to handling this issues - ? What are the drawbacks? No Stem search? Other considerations? Greetings, Philipp --=_alternative 00526628C1256FB6_=--