Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 73612 invoked from network); 4 Feb 2003 15:57:35 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 4 Feb 2003 15:57:35 -0000 Received: (qmail 2719 invoked by uid 97); 4 Feb 2003 15:59:03 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@nagoya.betaversion.org Received: (qmail 2711 invoked from network); 4 Feb 2003 15:59:03 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 4 Feb 2003 15:59:03 -0000 Received: (qmail 71911 invoked by uid 500); 4 Feb 2003 15:57:07 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 71841 invoked from network); 4 Feb 2003 15:57:05 -0000 Received: from corp.eb.com (HELO chiex03.britannica.net) (216.146.93.6) by daedalus.apache.org with SMTP; 4 Feb 2003 15:57:05 -0000 Received: by chiex03.britannica.net with Internet Mail Service (5.5.2656.59) id <1JKCNLS1>; Tue, 4 Feb 2003 09:55:16 -0600 Message-ID: <3E77AFA3658B514B9D8BB982ADDC934E39AD8A@chiex02.britannica.net> From: "Sale, Doug" To: 'Lucene Developers List' Subject: RE: joker * problem Date: Tue, 4 Feb 2003 09:10:20 -0600 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2656.59) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C2CC5F.88B15E40" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N ------_=_NextPart_001_01C2CC5F.88B15E40 Content-Type: text/plain; charset="iso-8859-1" this came up on the list a day or two ago... i believe that someone said that wildcard queries of the form "*" are not run through the analyzer. methinks this is probably because they don't want any stemming to be done on the partial term... what you really need in this case is to employ the same analyzer used in the indexing, but without plural or suffix (porter) stemming, and not removing wildcard chars. or, for a dirty hack, "query.toLowerCase()". anyway, i believe this is a "feature". interesting problem - anyone? -doug > -----Original Message----- > From: Ralph Schaer [mailto:ralphschaer@yahoo.com] > Sent: Tuesday, February 04, 2003 1:33 AM > To: lucene-dev@jakarta.apache.org > Subject: joker * problem > > > Hello > I found a problem with the joker * and lower/uppercase search > strings. (latest nightly build) > Here's the index > IndexWriter writer = new IndexWriter("c:\\temp\\ix", new > StandardAnalyzer(), true); > Document doc = new Document(); > doc.add(Field.UnStored("txt", "Onetwo")); > doc.add(Field.UnStored("txt", "two three")); > doc.add(Field.UnIndexed("id", "1")); > writer.addDocument(doc); > writer.optimize(); > writer.close(); > Searcher searcher = new IndexSearcher("c:\\temp\\ix"); > > Without the joker I can enter the search string lower or uppercase. > Both queries find the document: > Query query = QueryParser.parse("onetwo", "txt", new > StandardAnalyzer()); > Query query = QueryParser.parse("Onetwo", "txt", new > StandardAnalyzer()); > > But with the joker * the uppercase version does not find the document: > Query query = QueryParser.parse("one*", "txt", new > StandardAnalyzer()); <-- document found > Query query = QueryParser.parse("One*", "txt", new > StandardAnalyzer()); <-- no document found > > Regards > Ralph > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org > > ------_=_NextPart_001_01C2CC5F.88B15E40--