Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 64216 invoked from network); 24 Feb 2010 18:40:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Feb 2010 18:40:25 -0000 Received: (qmail 19301 invoked by uid 500); 24 Feb 2010 18:40:23 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 19246 invoked by uid 500); 24 Feb 2010 18:40:23 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 19236 invoked by uid 99); 24 Feb 2010 18:40:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Feb 2010 18:40:23 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of erickerickson@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-ww0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 24 Feb 2010 18:40:14 +0000 Received: by wwb34 with SMTP id 34so1333873wwb.35 for ; Wed, 24 Feb 2010 10:39:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=CmJXzPkCaDmucddgEdpNphaSEe2m8I6zpONGIZwdRNY=; b=hoIft1mn3Z3MJxHlsBcuVv0Db4nAVaQbB2fRZmJQMYE+uTIAhNKPCEFbhvjF4OZP2P cS2EgjjsJtIIjpIPBHtf2gkRfoaEaj5XTjOlvI71yOYd7gE8WvXQX8Tx0zJTmnn2w8dW ZmQ05pHok6tohGch2TdTVMBgafgX1R23Oo/Rc= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=NbsJiaw91mf2HTjNl4jTAqu2sQBZwBPuN7y3g8+5nGcAfSm/2Z6Lc8RDZl52jEBfYx k9WwhAifb0x2Ijdc8WPG9cu2GUhXkrdUmqsyJESVILCs2mOwh5FKKRmAKWviDbYnwbXD XXznOk3h7e5Er8afdzU/HeqJNA+eKOF7KR6Nk= MIME-Version: 1.0 Received: by 10.216.90.4 with SMTP id d4mr114384wef.135.1267036793287; Wed, 24 Feb 2010 10:39:53 -0800 (PST) In-Reply-To: <002052E02A48964A8035D9B6E8A1647DAF926D@0015-its-exmb01.us.saic.com> References: <002052E02A48964A8035D9B6E8A1647DAF9202@0015-its-exmb01.us.saic.com> <3836ec641002240842j4ae74472k8d56c6b40c3993d3@mail.gmail.com> <002052E02A48964A8035D9B6E8A1647DAF926D@0015-its-exmb01.us.saic.com> Date: Wed, 24 Feb 2010 13:39:53 -0500 Message-ID: <359a92831002241039w7d66975bn96b7b447827a16b6@mail.gmail.com> Subject: Re: StandardAnalyzer and comma From: Erick Erickson To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016e6d64815fb381f04805cfd10 X-Virus-Checked: Checked by ClamAV on apache.org --0016e6d64815fb381f04805cfd10 Content-Type: text/plain; charset=ISO-8859-1 OK, I'm confused. In your original message, you said that changing analyzers is NOT an option. Then you said you'll give WhitespaceAnalyzer a shot.... Assuming your original constraint is accurate, why isn't changing analyzers an option? Are you aware of PerFieldAnalyzerWrapper which allows you to specify different analyzers for different fields? If absolutely necessary, you could copy the field indicated into another field that you use for this case, which would isolate this change from any other part of your index. Be aware that WhitespaceAnalyzer does NOT fold case, so groupc would not match groupC. But it's easy to fix this. You can either take care to lowercase your input and query streams, or compose your own analyzer from, say, lowerCaseFilter and WhiteSpaceTokenizer to handle all that automatically. HTH Erick On Wed, Feb 24, 2010 at 12:10 PM, Murdoch, Paul wrote: > Thanks for the input. I'll give the WhitespaceAnalyzer a shot. Also, > AFAIK, Field.Index.NOT_ANALYZED means that the content you index is not > split into separate tokens so it is searchable, but only for exact > matches. I may be able to get what I want with the WhitespaceAnalyzer > and Field.Index.NOT_ANALYZED. Thanks again. > > Paul > > -----Original Message----- > From: java-user-return-45134-PAUL.B.MURDOCH=saic.com@lucene.apache.org > [mailto:java-user-return-45134-PAUL.B.MURDOCH=saic.com@lucene.apache.org > ] On Behalf Of Max Lynch > Sent: Wednesday, February 24, 2010 11:42 AM > To: java-user@lucene.apache.org > Subject: Re: StandardAnalyzer and comma > > Personally punctuation matters in my queries so I use > WhitespaceAnalyzer. I > also only want exact hits, so that analyzer works well for me. > > Also, AFAIK you don't set NOT_ANALYZED if you want to search through it. > > On Wed, Feb 24, 2010 at 10:33 AM, Murdoch, Paul > wrote: > > > I'm using Lucene 2.9. How do I make a comma behave like a regular > > character using the StandardAnalyzer? Example: > > > > > > > > I have a field called "choice" and some field values: > > > > > > > > groupA, morning > > > > groupB, noon > > > > groupC, night > > > > morning > > > > noon > > > > night > > > > > > > > So a query choice:night returns "groupC, night" and "night". Well, I > > only wanted "night". The StandardAnalyzer strips the commas from > > phrases and splits on whitespace. A phrase query choice:"night" > > produces the same results. I think indexing the field values as > > NOT_ANALYZED and making the comma behave as a regular character will > > solve this. > > > > > > > > Of course I have thought about choice:(night -groupC). That is not an > > option because the contents of the index are hidden from the front end > > where queries are made by users. I looked into changing > > StandardTokenizerImpl punctuation, but I'm hoping for a more simple > > solution. Also, changing analyzers is not an option. I could > possibly > > extend the StandardAnalyzer, but how do I set the punctuation > settings? > > Any help here would be great. This seems like it should be an easy > fix > > so I hope I've missed something simple. > > > > > > > > Thanks, > > > > Paul > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --0016e6d64815fb381f04805cfd10--