Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (herse.apache.org: local policy)
To: java-user@lucene.apache.org
Subject: Fw: Urgent : Specific search problem with whitespace analyzer
MIME-Version: 1.0
Message-ID: 
 <OF87DBB200.4A74D651-ON6525722C.0046B79D-6525722C.0046E65C@hewitt.com>
From: "Krishnendra Nandi" <krishnendra.nandi@hewitt.com>
Date: Mon, 20 Nov 2006 18:24:15 +0530
Content-Type: multipart/alternative;
 boundary="=_alternative 0046E6596525722C_="

--=_alternative 0046E6596525722C_=
Content-Type: text/plain;
 charset=us-ascii
Content-Transfer-Encoding: 7bit

Hi,

I am doing "field:text" kind of search using my own analyzer which behaves 
like whitespaceanalyzer. Following are the code snippets for my own 
whitespaceanalyzer and whitespacetokenizer.


// WhiteSpaceAnalyzerMaestro.java
package com.hewitt.itk.maestro.support.service.simplesearch;

import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;

/** An Analyzer that uses WhitespaceTokenizer. */

public final class WhitespaceAnalyzerMaestro extends Analyzer {
  public TokenStream tokenStream(String fieldName, Reader reader) {
    return new WhitespaceTokenizerMaestro(reader);
  }
} 


// WhitespaceTokenizerMaestro.java
package com.hewitt.itk.maestro.support.service.simplesearch;

import java.io.Reader;

import org.apache.lucene.analysis.WhitespaceTokenizer;

/** A WhitespaceTokenizerMaestro is a tokenizer that divides text at 
whitespace.
 * Adjacent sequences of non-Whitespace characters form tokens. */

public class WhitespaceTokenizerMaestro extends WhitespaceTokenizer {
  /** Construct a new WhitespaceTokenizerMaestro. */
  public WhitespaceTokenizerMaestro(Reader in) {
    super(in);
  }

  /** Collects only characters which do not satisfy
   * {@link Character#isWhitespace(char)} 
   * and lowercases that character before returning.*/
  protected boolean isTokenChar(char c) {
        c = Character.toLowerCase(c); 
    return !Character.isWhitespace(c);
  }
}


I have modified the tokenizer class by making it return characters in 
lower case.

Now my search criteria is  ISSUE_TITLE:test  in which  ISSUE_TITLE is the 
field in which test is to be searched. 

Following is my code snippet which is doing the search:

BooleanQuery masterQuery = new BooleanQuery();
 
 masterQuery.add(MultiFieldQueryParser.parse(
                                                        searchQuery,
                                                        fields,
                                                        analyzer),
                            REQUIRED,
                            PROHIBITED);

Here the searchquery is   ISSUE_TITLE:test , fields is the array of fields 
in which ISSUE_TITLE is one of the fields and analyzer is 
WhitespaceAnalyzerMaestro() (already mentioned above).

When I run the search, the masterQuery I get after running the above code 
snippet has the following value: 
+(ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test*)

which I think is not correct. Is the MultiFieldQueryParser not supporting 
WhiteSpaceAnalyzer?

Please help.

Regards
Krishnendra Nandi

 
The information contained in this e-mail and any accompanying documents may contain information that is confidential or otherwise protected from disclosure. If you are not the intended recipient of this message, or if this message has been addressed to you in error, please immediately alert the sender by reply e-mail and then delete this message, including any attachments. Any dissemination, distribution or other use of the contents of this message by anyone other than the intended recipient 
is strictly prohibited.


--=_alternative 0046E6596525722C_=--