Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 81736 invoked from network); 6 Mar 2009 11:52:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Mar 2009 11:52:45 -0000 Received: (qmail 41282 invoked by uid 500); 6 Mar 2009 11:52:38 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 40816 invoked by uid 500); 6 Mar 2009 11:52:37 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 40805 invoked by uid 99); 6 Mar 2009 11:52:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Mar 2009 03:52:36 -0800 X-ASF-Spam-Status: No, hits=2.7 required=10.0 tests=SPF_PASS,TRACKER_ID X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates 209.85.220.168 as permitted sender) Received: from [209.85.220.168] (HELO mail-fx0-f168.google.com) (209.85.220.168) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Mar 2009 11:52:28 +0000 Received: by fxm12 with SMTP id 12so366008fxm.5 for ; Fri, 06 Mar 2009 03:52:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=C5flG3jYJBLcHeDT+Y8FJy0p1QqWX7eNPUEJpBrSnQE=; b=sk25V8aQuyg2sFSLWrH7yN9eiDTqYGs7db6T3CNUs43tffaojb8W7XLQa2biCtHDS0 VBCSsYpsjQhK8gXzNOt8Ep9PhI4bXlWrxIdSL+1FhhWLom/Qtx8/4SsiZRZ/6M3M5Kuv 6pp0CRrurV4xyFyqRUODI76RVuL8MXii5SxSw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=nIVDiib8oloCdNI2Iv+2Xq0DDLk7bBbZb9NaMqKOm3+XTVcXK83t2R5DLOGtTu06X5 4TsVTey4+aAyVfwKymqgqVt6uW5TYmoV7YnGF0wEcInI0XLuZiGe1Ak9eSyrhtIFzk/2 gfCf9H9YEahm9vDLbJqiAeuHJT2s6wTmdBfVY= MIME-Version: 1.0 Received: by 10.181.11.3 with SMTP id o3mr757597bki.172.1236340326731; Fri, 06 Mar 2009 03:52:06 -0800 (PST) In-Reply-To: <324177540903060133o1845d68fl7cbd1dcc70ab2d27@mail.gmail.com> References: <324177540903052234r3be27ddbi40508921204c1f3c@mail.gmail.com> <324177540903060133o1845d68fl7cbd1dcc70ab2d27@mail.gmail.com> Date: Fri, 6 Mar 2009 11:52:06 +0000 Message-ID: <8c4e68610903060352n5859178ftbfb2190e583cdb8f@mail.gmail.com> Subject: Re: indexing but not tokenizing From: Ian Lea To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org I don't know how QueryParser works behind the scenes but it looks like this is at least known behaviour. From the QueryParser javadocs: setLowercaseExpandedTerms public void setLowercaseExpandedTerms(boolean lowercaseExpandedTerms) Whether terms of wildcard, prefix, fuzzy and range queries are to be automatically lower-cased or not. Default is true. So you will need to call parser.setLowercaseExpandedTerms(false) in this case. Might be a problem if you are parsing a complex query with multiple range or other expanded queries, some of which you want preserved, some not. If things are that complex you'll be better off creating your queries via RangeQuery etc. It isn't hard and you can still use QueryParser where appropriate - add the resultant queries to a BooleanQuery or whatever. -- Ian. On Fri, Mar 6, 2009 at 9:33 AM, John Marks wrote: > Another problem. > > Using the PerFieldAnalyzerWrapper solves the case where I have a > simple query, such as the following: > =A0 =A0 =A0Query query =3D parser.parse("X"); > or > =A0 =A0 =A0Query query =3D parser.parse("X OR Y"); > but if I use a more complex query like the following: > =A0 =A0 =A0Query query =3D parser.parse("[A TO Z]"); > then, again, the parser transforms the query to lowercase, as shown in > the code below. > > Output is: > =A0 =A0 =A0Query: B:[a TO z] > =A0 =A0 =A00 total matching documents > while I would have expected to get > =A0 =A0 =A0Query: B:[A TO Z] > =A0 =A0 =A0 ... > > This means that even the KeywordAnalyzer converts A and Z to lowercase > in the range query? > > Should I report this as a bug? > > -John > > > > --- code --- > package test; > > import org.apache.lucene.analysis.PerFieldAnalyzerWrapper; > import org.apache.lucene.analysis.SimpleAnalyzer; > import org.apache.lucene.analysis.KeywordAnalyzer; > import org.apache.lucene.store.RAMDirectory; > import org.apache.lucene.index.IndexWriter; > import org.apache.lucene.search.IndexSearcher; > import org.apache.lucene.search.Query; > import org.apache.lucene.search.TopDocCollector; > import org.apache.lucene.search.ScoreDoc; > import org.apache.lucene.document.Document; > import org.apache.lucene.document.Field; > import org.apache.lucene.queryParser.QueryParser; > > > > public class Test > { > =A0public static void main(String[] args) > =A0{ > =A0 =A0try > =A0 =A0{ > =A0 =A0 =A0RAMDirectory idx =3D new RAMDirectory(); > > =A0 =A0 =A0PerFieldAnalyzerWrapper aWrapper =3D > =A0 =A0 =A0 =A0new PerFieldAnalyzerWrapper(new SimpleAnalyzer()); > =A0 =A0 =A0aWrapper.addAnalyzer("B", new KeywordAnalyzer()); > > =A0 =A0 =A0IndexWriter writer =3D new IndexWriter(idx, aWrapper, true, > =A0 =A0 =A0 =A0 =A0IndexWriter.MaxFieldLength.LIMITED); > > =A0 =A0 =A0Document doc =3D new Document(); > =A0 =A0 =A0doc.add(new Field("A", "X", > =A0 =A0 =A0 =A0 =A0Field.Store.YES, Field.Index.NO)); > =A0 =A0 =A0doc.add(new Field("B", "X", > =A0 =A0 =A0 =A0 =A0Field.Store.YES, Field.Index.NOT_ANALYZED)); > =A0 =A0 =A0doc.add(new Field("C", "X", > =A0 =A0 =A0 =A0 =A0Field.Store.YES, Field.Index.ANALYZED)); > =A0 =A0 =A0doc.add(new Field("D", "X", > =A0 =A0 =A0 =A0 =A0Field.Store.NO, Field.Index.NOT_ANALYZED)); > =A0 =A0 =A0doc.add(new Field("E", "X", > =A0 =A0 =A0 =A0 =A0Field.Store.NO, Field.Index.ANALYZED)); > =A0 =A0 =A0writer.addDocument(doc); > =A0 =A0 =A0writer.close(); > > =A0 =A0 =A0IndexSearcher searcher =3D new IndexSearcher(idx); > =A0 =A0 =A0String field =3D "B"; > =A0 =A0 =A0QueryParser parser =3D new QueryParser(field, aWrapper); > =A0 =A0 =A0Query query =3D parser.parse("[A TO Z]"); > =A0 =A0 =A0System.out.println("Query: " + query.toString()); > > =A0 =A0 =A0TopDocCollector collector =3D new TopDocCollector(1); > =A0 =A0 =A0searcher.search(query, collector); > =A0 =A0 =A0int numHits =3D collector.getTotalHits(); > =A0 =A0 =A0System.out.println(numHits + " total matching documents"); > > =A0 =A0 =A0if ( numHits > 0) > =A0 =A0 =A0{ > =A0 =A0 =A0 =A0ScoreDoc[] hits =3D collector.topDocs().scoreDocs; > =A0 =A0 =A0 =A0doc =3D searcher.doc(hits[0].doc); > =A0 =A0 =A0 =A0System.out.println("A: " + doc.get("A")); > =A0 =A0 =A0 =A0System.out.println("B: " + doc.get("B")); > =A0 =A0 =A0 =A0System.out.println("C: " + doc.get("C")); > =A0 =A0 =A0 =A0System.out.println("D: " + doc.get("D")); > =A0 =A0 =A0 =A0System.out.println("E: " + doc.get("E")); > =A0 =A0 =A0} > =A0 =A0} > =A0 =A0catch (Exception e) > =A0 =A0{ > =A0 =A0 =A0System.out.println(" caught a " + e.getClass() + "\n with mess= age: " > =A0 =A0 =A0 =A0 =A0+ e.getMessage()); > =A0 =A0} > =A0} > > } > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org