Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 83555 invoked from network); 3 Apr 2004 16:18:13 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 3 Apr 2004 16:18:13 -0000 Received: (qmail 51979 invoked by uid 500); 3 Apr 2004 16:18:02 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 51958 invoked by uid 500); 3 Apr 2004 16:18:02 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 51944 invoked from network); 3 Apr 2004 16:18:02 -0000 Received: from unknown (HELO mz3.forethought.net) (216.241.36.14) by daedalus.apache.org with SMTP; 3 Apr 2004 16:18:02 -0000 Received: from j72.denver.dsl.forethought.net ([216.241.38.72]) by mz3.forethought.net with esmtp (Exim 4.30) id 1B9nqO-0000MQ-Ku for lucene-user@jakarta.apache.org; Sat, 03 Apr 2004 09:18:04 -0700 From: Tatu Saloranta Reply-To: tatu@hypermall.net Organization: Linux-users missalie To: "Lucene Users List" Subject: Re: Zero hits for queries ending with a number Date: Sat, 3 Apr 2004 09:24:02 -0700 User-Agent: KMail/1.5 References: <20040313100601.26492.qmail@web12708.mail.yahoo.com> <375B4608-8581-11D8-A44C-000393A564E6@ehatchersolutions.com> <200404031734.06254.lucene@nitwit.de> In-Reply-To: <200404031734.06254.lucene@nitwit.de> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200404030924.02896.tatu@hypermall.net> X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Saturday 03 April 2004 08:34, lucene@nitwit.de wrote: > On Saturday 03 April 2004 17:11, Erik Hatcher wrote: > > No objections that error messages and such could be made clearer. > > Patches welcome! Care to submit better error message handling in this > > case? Or perhaps allow lower-case "to"? > > I think the best would be if Lucene would simply have a > setCaseSensitive(boolean). > > IMHO it's in any case a bad idea to make searches case-sensitive (per > default). I'd have to disagree. I think that search engine core should not have to bother with details of character sets, such as lower-casing. Rules for lower/upper/initial/mixed case for all Unicode-languages are rather involved... and if you tried to do that, next thing would be whether accentuation and umlaut marks should matter or not (which is language dependant). That's why to me the natural way to go is to do direct comparison, ignoring case when executing queries. This does not prevent anyone from implementing such functionality (see below). I think architecture and design of Lucene core is delightfully simple. One can easily create case-independent functionality by using proper analyzers, and (for the most part), configuring QueryParser. I would agree, however, that QueryParser is "victim of its success"; it's too often used in situations where one really should create proper GUI that builds the query. Backend code can then mangle input as it sees fit, and build query objects. QueryParser is more natural for quick-n-dirty scenarios, where one just has to slap something together quickly, or if one only has textual interface to deal with. It's nice thing to have, but it has its limitations; there's no way to create one parser that's perfect for every use(r). What could be done would be to make sure all examples / demo web apps would implement case-insensitive indexing and searching, since that is often what is needed? -+ Tatu +- > > > But, also, folks need to really step back and practice basic > > troubleshooting skills. I asked you if that string was what you passed > > to the QueryParser and you said yes, when in fact it was not. And you > > I forgot that I did lower-case it. I fact I even output it in it's original > state but lower-case it just before I pass it to lucene. That lower-casing > is what I would call a hack and hence it's no surprise that I forgot it :-) > > Timo > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org