Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 72787 invoked from network); 5 Sep 2006 21:06:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 5 Sep 2006 21:06:42 -0000 Received: (qmail 47840 invoked by uid 500); 5 Sep 2006 21:06:35 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 47811 invoked by uid 500); 5 Sep 2006 21:06:35 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 47800 invoked by uid 99); 5 Sep 2006 21:06:35 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Sep 2006 14:06:35 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of lists@nabble.com designates 72.21.53.35 as permitted sender) Received: from [72.21.53.35] (HELO talk.nabble.com) (72.21.53.35) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Sep 2006 14:06:34 -0700 Received: from [72.21.53.38] (helo=jubjub.nabble.com) by talk.nabble.com with esmtp (Exim 4.50) id 1GKi7Z-0003Qs-VJ for java-user@lucene.apache.org; Tue, 05 Sep 2006 14:06:13 -0700 Message-ID: <6160316.post@talk.nabble.com> Date: Tue, 5 Sep 2006 14:06:13 -0700 (PDT) From: Philip Brown To: java-user@lucene.apache.org Subject: Re: Phrase search using quotes -- special Tokenizer In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: pmb@us.ibm.com References: <6093138.post@talk.nabble.com> <44F81F0D.7080605@gmail.com> <6098930.post@talk.nabble.com> <44F83F0A.7080005@gmail.com> <6106920.post@talk.nabble.com> <6107649.post@talk.nabble.com> <359a92830609011659o51839642g31502fef0fc86b28@mail.gmail.com> <6109067.post@talk.nabble.com> <359a92830609020643le432a02qeb19b6ec906e915f@mail.gmail.com> <44F98D2C.7030007@gmail.com> <6115360.post@talk.nabble.com> <359a92830609030745u61252dc9t16daa772218d3b96@mail.gmail.com> <6125651.post@talk.nabble.com> <6128827.post@talk.nabble.com> <6134864.post@talk.nabble.com> <6145591.post@talk.nabble.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Here's a little sample program (borrowed some code from Erick Erickson :)). Whether I add as TOKENIZED or UN_TOKENIZED seems to make no difference in the output. Is this what you'd expect? - Philip package com.test; import java.io.IOException; import java.util.HashSet; import java.util.regex.Pattern; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.KeywordAnalyzer; import org.apache.lucene.analysis.PerFieldAnalyzerWrapper; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.memory.PatternAnalyzer; import org.apache.lucene.queryParser.QueryParser; import org.apache.lucene.search.Hits; import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.store.RAMDirectory; public class Test2 { private PerFieldAnalyzerWrapper analyzer = null; private RAMDirectory idx = null; private Analyzer getAnalyzer() { if (analyzer == null) { analyzer = new PerFieldAnalyzerWrapper(new StandardAnalyzer()); analyzer.addAnalyzer("keyword", new KeywordAnalyzer()); } return analyzer; } private void makeTestIndex() throws Exception { idx = new RAMDirectory(); IndexWriter writer = new IndexWriter(idx, getAnalyzer(), true); Document doc = new Document(); doc.add(new Field("keyword", "hello world", Field.Store.YES, Field.Index.UN_TOKENIZED)); doc.add(new Field("booleanField", "false", Field.Store.YES, Field.Index.UN_TOKENIZED)); writer.addDocument(doc); doc = new Document(); doc.add(new Field("keyword", "hello world", Field.Store.YES, Field.Index.UN_TOKENIZED)); doc.add(new Field("booleanField", "true", Field.Store.YES, Field.Index.UN_TOKENIZED)); writer.addDocument(doc); System.out.println(writer.docCount()); writer.optimize(); writer.close(); } private void doSearch(String query, int expectedHits) throws Exception { try { QueryParser qp = new QueryParser("keyword", getAnalyzer()); IndexSearcher srch = new IndexSearcher(idx); Query tmp = qp.parse(query); // Uncomment to see parsed form of query System.out.println("Parsed form is '" + tmp.toString() + "'"); Hits hits = srch.search(tmp); String msg = ""; if (hits.length() == expectedHits) { msg = "Test passed "; } else { msg = "************TEST FAILED************ "; } System.out.println(msg + "Expected " + Integer.toString(expectedHits) + " hits, got " + Integer.toString(hits.length()) + " hits"); } catch (IOException e) { System.out.println("Caught IOException"); e.printStackTrace(); } } public static void main(String[] args) { try { Test2 test = new Test2(); test.makeTestIndex(); test.doSearch("Hello World", 0); test.doSearch("hello world", 0); test.doSearch("hello", 0); test.doSearch("world", 0); test.doSearch("\"Hello World\"", 0); test.doSearch("\"hello world\"", 2); test.doSearch("\"hello world\" +booleanField:false", 1); test.doSearch("\"hello world\" +booleanField:true", 1); } catch (Exception e) { System.err.println(e.getMessage()); } } } Chris Hostetter wrote: > > > : So, if I do as you suggest below (using PerFieldAnalyzerWrapper with > : StandardAnalyzer) then I still need to enclose in quotes the phrases > : (keywords with spaces) when I issue the search, and they are only > returned > > Yes, quotes will be neccessary to tell the QueryParser "this > is one chunk of text, passs it to the analyzer whole" - but that's so you > can get the "compelx" part of the problem you described... recognizing > that "my brown-cow" and "red fox" should be matched as seperate values > intead of trying to find one big vlaue containing "my brown-cow red fox" > > : in the results if the case is identical to how it was added? (This > seems to > : be what I observe anyway. And whether I add as TOKENIZED or > UN_TOKENIZED > : seems to have no effect.) > > 1) wether case matters is determined enitrely by your analyzer, if it > produces differnet tokens for "Blue" and "BLUE" then case matters > 2) use TOKENIZED or your Analyzer will be completely irrelevant > 3) if you observse something working differently then you expect, post the > code -- we're way pastthe point of being able to offer you any > meaningful help without seeing a self contained example of what you want > to see work. > > > > -Hoss > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Phrase-search-using-quotes----special-Tokenizer-tf2200760.html#a6160316 Sent from the Lucene - Java Users forum at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org