Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 46556 invoked from network); 12 Feb 2003 23:32:14 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 12 Feb 2003 23:32:14 -0000 Received: (qmail 27756 invoked by uid 97); 12 Feb 2003 23:33:50 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@nagoya.betaversion.org Received: (qmail 27749 invoked from network); 12 Feb 2003 23:33:50 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 12 Feb 2003 23:33:50 -0000 Received: (qmail 46293 invoked by uid 500); 12 Feb 2003 23:32:11 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 46275 invoked from network); 12 Feb 2003 23:32:10 -0000 Received: from smtp-out-3.wanadoo.fr (HELO mel-rto3.wanadoo.fr) (193.252.19.233) by daedalus.apache.org with SMTP; 12 Feb 2003 23:32:10 -0000 Received: from mel-rta6.wanadoo.fr (193.252.19.26) by mel-rto3.wanadoo.fr (6.7.015) id 3E0C33B501E49F63 for lucene-dev@jakarta.apache.org; Thu, 13 Feb 2003 00:32:17 +0100 Received: from rousseau (80.11.172.169) by mel-rta6.wanadoo.fr (6.7.015) id 3E26CE210101EA44 for lucene-dev@jakarta.apache.org; Thu, 13 Feb 2003 00:32:17 +0100 From: "Martin Sevigny" To: "'Lucene Developers List'" Subject: RE : [PATCH] Refactoring QueryParser.jj, setLowercaseWildcardTerms() Date: Thu, 13 Feb 2003 00:33:38 +0100 Message-ID: <001c01c2d2ef$2b9d6ff0$697ba8c0@SYMPATICOCA> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.3416 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Importance: Normal In-Reply-To: X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Hi, > > Also, I think we should lowercase prefix and wildcard queries by=20 > > default. This would fix one of the most frequently=20 > reported problems.=20 > > Yes, it might also break folks who currently do case-sensitive=20 > > wildcard queries, but I suspect they are far fewer than=20 > those who will=20 > > continue to complain about the default case-sensitivity of wildcard=20 > > searches. What do others think? >=20 > For the StandardAnalyzer this might work, but for the=20 > GermanAnalyzer, there is also the problem with Umlauts=20 > (=E4,=F6,=FC) turned into vowels (a,o,u) while indexing. An=20 > example: "H=E4user" is the plural of "Haus". If I index=20 > "H=E4user" it is stemmed to "hau". If I do for example a search=20 > for "h=E4us*" nothing is found, because "h=E4us" is not stemmed.=20 > If I would analyze "h=E4us*" I should get "hau*". The problem=20 > is, that now you do not only get "H=E4user" but also "Haus" as=20 > result. But I think it is better to get more results than no=20 > result. This is perhaps a special problem with the=20 > GermanAnalyzer. May be there could be an option to use the=20 > Analyzer also for wildcard queries. So I can turn it on in my=20 > case and defaults to off. Hope you understand my problem ;) I second that, it is true for many languages where a "standard" analyzer will most of the time do more than removing uppercase, it will remove "diacritics" like in the above example. Along with possibly stemming. Lucene is a wonderful tool for building i18n-ready search engines, let's not forget it ;-) Martin S=E9vigny --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org