Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 39648 invoked from network); 20 Nov 2006 12:55:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Nov 2006 12:55:03 -0000 Received: (qmail 99661 invoked by uid 500); 20 Nov 2006 12:55:05 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 99524 invoked by uid 500); 20 Nov 2006 12:55:05 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 99504 invoked by uid 99); 20 Nov 2006 12:55:05 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Nov 2006 04:55:05 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: local policy) Received: from [204.152.235.220] (HELO l98upmt3.hewitt.com) (204.152.235.220) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Nov 2006 04:54:52 -0800 Received: from linkap11.hewitt.com (linkap11.hewitt.com [10.20.63.2]) by l98upmt3.hewitt.com (8.13.4/8.13.4) with ESMTP id kAKCsUCC028236 for ; Mon, 20 Nov 2006 06:54:30 -0600 (CST) Received: from 10.20.63.62 by linkap11.hewitt.com with ESMTP (Tumbleweed MMS SMTP Relay); Mon, 20 Nov 2006 06:54:21 -0600 X-Server-Uuid: 8E43C66C-9DC0-4EF6-8749-D8C04BB61E49 To: java-user@lucene.apache.org Subject: Fw: Urgent : Specific search problem with whitespace analyzer MIME-Version: 1.0 X-Mailer: Lotus Notes 652HF1094 September 19, 2005 Message-ID: From: "Krishnendra Nandi" Date: Mon, 20 Nov 2006 18:24:15 +0530 X-MIMETrack: S/MIME Sign by Notes Client on Krishnendra Nandi/Gurgaon/Hewitt Associates(652HF1094|September 19, 2005) at 11/20/2006 06:24:24 PM, Serialize by Notes Client on Krishnendra Nandi/Gurgaon/Hewitt Associates(652HF1094|September 19, 2005) at 11/20/2006 06:24:24 PM, Serialize complete at 11/20/2006 06:24:24 PM, S/MIME Sign failed at 11/20/2006 06:24:25 PM: The cryptographic key was not found, Serialize by Router on LINTNG1/National/Hewitt Associates(Release 6.5.5FP1 HF52|June 29, 2006) at 11/20/2006 06:54:21 AM, Serialize complete at 11/20/2006 06:54:21 AM X-HANotesOU: Gurgaon X-WSS-ID: 697F7AF63DW27840156-02-01 Content-Type: multipart/alternative; boundary="=_alternative 0046E6596525722C_=" X-Virus-Checked: Checked by ClamAV on apache.org --=_alternative 0046E6596525722C_= Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi, I am doing "field:text" kind of search using my own analyzer which behaves like whitespaceanalyzer. Following are the code snippets for my own whitespaceanalyzer and whitespacetokenizer. // WhiteSpaceAnalyzerMaestro.java package com.hewitt.itk.maestro.support.service.simplesearch; import java.io.Reader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.TokenStream; /** An Analyzer that uses WhitespaceTokenizer. */ public final class WhitespaceAnalyzerMaestro extends Analyzer { public TokenStream tokenStream(String fieldName, Reader reader) { return new WhitespaceTokenizerMaestro(reader); } } // WhitespaceTokenizerMaestro.java package com.hewitt.itk.maestro.support.service.simplesearch; import java.io.Reader; import org.apache.lucene.analysis.WhitespaceTokenizer; /** A WhitespaceTokenizerMaestro is a tokenizer that divides text at whitespace. * Adjacent sequences of non-Whitespace characters form tokens. */ public class WhitespaceTokenizerMaestro extends WhitespaceTokenizer { /** Construct a new WhitespaceTokenizerMaestro. */ public WhitespaceTokenizerMaestro(Reader in) { super(in); } /** Collects only characters which do not satisfy * {@link Character#isWhitespace(char)} * and lowercases that character before returning.*/ protected boolean isTokenChar(char c) { c = Character.toLowerCase(c); return !Character.isWhitespace(c); } } I have modified the tokenizer class by making it return characters in lower case. Now my search criteria is ISSUE_TITLE:test in which ISSUE_TITLE is the field in which test is to be searched. Following is my code snippet which is doing the search: BooleanQuery masterQuery = new BooleanQuery(); masterQuery.add(MultiFieldQueryParser.parse( searchQuery, fields, analyzer), REQUIRED, PROHIBITED); Here the searchquery is ISSUE_TITLE:test , fields is the array of fields in which ISSUE_TITLE is one of the fields and analyzer is WhitespaceAnalyzerMaestro() (already mentioned above). When I run the search, the masterQuery I get after running the above code snippet has the following value: +(ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test*) which I think is not correct. Is the MultiFieldQueryParser not supporting WhiteSpaceAnalyzer? Please help. Regards Krishnendra Nandi The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient is strictly prohibited. --=_alternative 0046E6596525722C_=--