Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates
 209.85.220.168 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type:content-transfer-encoding;
        b=nIVDiib8oloCdNI2Iv+2Xq0DDLk7bBbZb9NaMqKOm3+XTVcXK83t2R5DLOGtTu06X5
         4TsVTey4+aAyVfwKymqgqVt6uW5TYmoV7YnGF0wEcInI0XLuZiGe1Ak9eSyrhtIFzk/2
         gfCf9H9YEahm9vDLbJqiAeuHJT2s6wTmdBfVY=
MIME-Version: 1.0
In-Reply-To: <324177540903060133o1845d68fl7cbd1dcc70ab2d27@mail.gmail.com>
References: <324177540903052234r3be27ddbi40508921204c1f3c@mail.gmail.com>
	 <324177540903060133o1845d68fl7cbd1dcc70ab2d27@mail.gmail.com>
Date: Fri, 6 Mar 2009 11:52:06 +0000
Message-ID: <8c4e68610903060352n5859178ftbfb2190e583cdb8f@mail.gmail.com>
Subject: Re: indexing but not tokenizing
From: Ian Lea <ian.lea@gmail.com>
To: java-user@lucene.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I don't know how QueryParser works behind the scenes but it looks like
this is at least known behaviour.  From the QueryParser javadocs:

setLowercaseExpandedTerms

public void setLowercaseExpandedTerms(boolean lowercaseExpandedTerms)

    Whether terms of wildcard, prefix, fuzzy and range queries are to
be automatically lower-cased or not. Default is true.


So you will need to call parser.setLowercaseExpandedTerms(false) in
this case.  Might be a problem if you are parsing a complex query with
multiple range or other expanded queries, some of which you want
preserved, some not.  If things are that complex you'll be better off
creating your queries via RangeQuery etc.  It isn't hard and you can
still use QueryParser where appropriate - add the resultant queries to
a BooleanQuery or whatever.


--
Ian.


On Fri, Mar 6, 2009 at 9:33 AM, John Marks <a85533109@gmail.com> wrote:
> Another problem.
>
> Using the PerFieldAnalyzerWrapper solves the case where I have a
> simple query, such as the following:
> =A0 =A0 =A0Query query =3D parser.parse("X");
> or
> =A0 =A0 =A0Query query =3D parser.parse("X OR Y");
> but if I use a more complex query like the following:
> =A0 =A0 =A0Query query =3D parser.parse("[A TO Z]");
> then, again, the parser transforms the query to lowercase, as shown in
> the code below.
>
> Output is:
> =A0 =A0 =A0Query: B:[a TO z]
> =A0 =A0 =A00 total matching documents
> while I would have expected to get
> =A0 =A0 =A0Query: B:[A TO Z]
> =A0 =A0 =A0 ...
>
> This means that even the KeywordAnalyzer converts A and Z to lowercase
> in the range query?
>
> Should I report this as a bug?
>
> -John
>
>
>
> --- code ---
> package test;
>
> import org.apache.lucene.analysis.PerFieldAnalyzerWrapper;
> import org.apache.lucene.analysis.SimpleAnalyzer;
> import org.apache.lucene.analysis.KeywordAnalyzer;
> import org.apache.lucene.store.RAMDirectory;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.TopDocCollector;
> import org.apache.lucene.search.ScoreDoc;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.queryParser.QueryParser;
>
>
>
> public class Test
> {
> =A0public static void main(String[] args)
> =A0{
> =A0 =A0try
> =A0 =A0{
> =A0 =A0 =A0RAMDirectory idx =3D new RAMDirectory();
>
> =A0 =A0 =A0PerFieldAnalyzerWrapper aWrapper =3D
> =A0 =A0 =A0 =A0new PerFieldAnalyzerWrapper(new SimpleAnalyzer());
> =A0 =A0 =A0aWrapper.addAnalyzer("B", new KeywordAnalyzer());
>
> =A0 =A0 =A0IndexWriter writer =3D new IndexWriter(idx, aWrapper, true,
> =A0 =A0 =A0 =A0 =A0IndexWriter.MaxFieldLength.LIMITED);
>
> =A0 =A0 =A0Document doc =3D new Document();
> =A0 =A0 =A0doc.add(new Field("A", "X",
> =A0 =A0 =A0 =A0 =A0Field.Store.YES, Field.Index.NO));
> =A0 =A0 =A0doc.add(new Field("B", "X",
> =A0 =A0 =A0 =A0 =A0Field.Store.YES, Field.Index.NOT_ANALYZED));
> =A0 =A0 =A0doc.add(new Field("C", "X",
> =A0 =A0 =A0 =A0 =A0Field.Store.YES, Field.Index.ANALYZED));
> =A0 =A0 =A0doc.add(new Field("D", "X",
> =A0 =A0 =A0 =A0 =A0Field.Store.NO, Field.Index.NOT_ANALYZED));
> =A0 =A0 =A0doc.add(new Field("E", "X",
> =A0 =A0 =A0 =A0 =A0Field.Store.NO, Field.Index.ANALYZED));
> =A0 =A0 =A0writer.addDocument(doc);
> =A0 =A0 =A0writer.close();
>
> =A0 =A0 =A0IndexSearcher searcher =3D new IndexSearcher(idx);
> =A0 =A0 =A0String field =3D "B";
> =A0 =A0 =A0QueryParser parser =3D new QueryParser(field, aWrapper);
> =A0 =A0 =A0Query query =3D parser.parse("[A TO Z]");
> =A0 =A0 =A0System.out.println("Query: " + query.toString());
>
> =A0 =A0 =A0TopDocCollector collector =3D new TopDocCollector(1);
> =A0 =A0 =A0searcher.search(query, collector);
> =A0 =A0 =A0int numHits =3D collector.getTotalHits();
> =A0 =A0 =A0System.out.println(numHits + " total matching documents");
>
> =A0 =A0 =A0if ( numHits > 0)
> =A0 =A0 =A0{
> =A0 =A0 =A0 =A0ScoreDoc[] hits =3D collector.topDocs().scoreDocs;
> =A0 =A0 =A0 =A0doc =3D searcher.doc(hits[0].doc);
> =A0 =A0 =A0 =A0System.out.println("A: " + doc.get("A"));
> =A0 =A0 =A0 =A0System.out.println("B: " + doc.get("B"));
> =A0 =A0 =A0 =A0System.out.println("C: " + doc.get("C"));
> =A0 =A0 =A0 =A0System.out.println("D: " + doc.get("D"));
> =A0 =A0 =A0 =A0System.out.println("E: " + doc.get("E"));
> =A0 =A0 =A0}
> =A0 =A0}
> =A0 =A0catch (Exception e)
> =A0 =A0{
> =A0 =A0 =A0System.out.println(" caught a " + e.getClass() + "\n with mess=
age: "
> =A0 =A0 =A0 =A0 =A0+ e.getMessage());
> =A0 =A0}
> =A0}
>
> }
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org