Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 14301 invoked from network); 13 Jul 2005 12:17:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 13 Jul 2005 12:17:35 -0000 Received: (qmail 85570 invoked by uid 500); 13 Jul 2005 12:17:26 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 85534 invoked by uid 500); 13 Jul 2005 12:17:25 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 85382 invoked by uid 99); 13 Jul 2005 12:17:21 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jul 2005 05:17:21 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,HTML_50_60,HTML_MESSAGE,MSGID_FROM_MTA_HEADER X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [203.199.83.32] (HELO rediffmail.com) (203.199.83.32) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 13 Jul 2005 05:17:17 -0700 Received: (qmail 305 invoked by uid 510); 13 Jul 2005 12:18:44 -0000 Date: 13 Jul 2005 12:18:44 -0000 Message-ID: <20050713121844.304.qmail@webmail32.rediffmail.com> Received: from unknown (61.95.167.91) by rediffmail.com via HTTP; 13 jul 2005 12:18:44 -0000 MIME-Version: 1.0 From: "Rahul D Thakare" Reply-To: "Rahul D Thakare" To: java-user@lucene.apache.org Subject: Wild card and multiple keyword search Content-type: multipart/alternative; boundary="Next_1121257124---0-203.199.83.32-32749" X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N --Next_1121257124---0-203.199.83.32-32749 Content-type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline =A0=0AHi, =0A=0A We are using doc.add(Field.Text("keywords",keywords)); to= add the keywords to the document, where keywords is comma separated keywor= ds string.=0ALucene seems to tokenize the keywords with multiple words like= (MAIN BOARD) as different keywords(ie as MAIN and BOARD). Tokenization is b= ased on comma and space...So if we search for "MAIN BOARD", documents havin= g keywords like "MAIN LOGIC", "MAIN PARTS", etc also show up=0A=0AIf one se= arches for "MAIN BOARD", we want get only the documents have "MAIN BOARD". = How to do this ?=0A=0ATo achieve this we used doc.add(Field.Keyword("keywo= rds", keywords)); and while searching=0Awe cannot use standard analyzer, wh= ile searching, as divides the keywords if we search keywords having space..= . so we wrote an KeywordAnalyser(KeywordAnalyzer is basically returns only = one single token) as given below.=0A=0A/**=0A * Tokenizes the entire stream= as single token=0A */=0A=0A public class KeywordAnalyzer extends Analyzer= =0A {=0A public TokenStream tokenStream(String fieldName, final Reader rea= der)=0A {=0A return new TokenStream(){=0A private boolean done;=0A = private final char[] buffer =3D new char[1024];=0A public Token next() = throws IOException=0A {=0A if(!done)=0A {=0A done =3D true;= =0A StringBuffer buffer =3D new StringBuffer();=0A int length =3D= 0;=0A while(true)=0A {=0A length =3D reader.read(this.buff= er);=0A if(length =3D=3D -1) break;=0A=0A buffer.append(this.bu= ffer,0,length);=0A }=0A String text =3D buffer.toString();=0A = return new Token(text.toUpperCase(),0,text.length());=0A }=0A ret= urn null;=0A }=0A };=0A }=0A }=0A=0AWhich solve the above said proble= m, but we are not able to the wild card searchs like MAIN*, etc.=0A=0AWe ne= ed both the functionality ie. =0A1. if user searches for MAIN BOARD, shoul= d get only documents that contain MAIN BOARD and not MAIN LOGIC, MAIN, MAIN= PART etc. =0A2. User should be able to do the wild card search like MAIN*,= etc and get the desired documents.=0A=0APlease let us know, how we should = do the indexing ? and which analyzer to use to do the search ?=0A=0Athanks= =0ARahul... --Next_1121257124---0-203.199.83.32-32749--