lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Krishnendra Nandi" <krishnendra.na...@hewitt.com>
Subject Fw: Urgent : Specific search problem with whitespace analyzer
Date Mon, 20 Nov 2006 12:54:15 GMT
Hi,

I am doing "field:text" kind of search using my own analyzer which behaves 
like whitespaceanalyzer. Following are the code snippets for my own 
whitespaceanalyzer and whitespacetokenizer.


// WhiteSpaceAnalyzerMaestro.java
package com.hewitt.itk.maestro.support.service.simplesearch;

import java.io.Reader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;

/** An Analyzer that uses WhitespaceTokenizer. */

public final class WhitespaceAnalyzerMaestro extends Analyzer {
  public TokenStream tokenStream(String fieldName, Reader reader) {
    return new WhitespaceTokenizerMaestro(reader);
  }
} 



// WhitespaceTokenizerMaestro.java
package com.hewitt.itk.maestro.support.service.simplesearch;

import java.io.Reader;

import org.apache.lucene.analysis.WhitespaceTokenizer;

/** A WhitespaceTokenizerMaestro is a tokenizer that divides text at 
whitespace.
 * Adjacent sequences of non-Whitespace characters form tokens. */

public class WhitespaceTokenizerMaestro extends WhitespaceTokenizer {
  /** Construct a new WhitespaceTokenizerMaestro. */
  public WhitespaceTokenizerMaestro(Reader in) {
    super(in);
  }

  /** Collects only characters which do not satisfy
   * {@link Character#isWhitespace(char)} 
   * and lowercases that character before returning.*/
  protected boolean isTokenChar(char c) {
        c = Character.toLowerCase(c); 
    return !Character.isWhitespace(c);
  }
}



I have modified the tokenizer class by making it return characters in 
lower case.

Now my search criteria is  ISSUE_TITLE:test  in which  ISSUE_TITLE is the 
field in which test is to be searched. 

Following is my code snippet which is doing the search:

BooleanQuery masterQuery = new BooleanQuery();
 
 masterQuery.add(MultiFieldQueryParser.parse(
                                                        searchQuery,
                                                        fields,
                                                        analyzer),
                            REQUIRED,
                            PROHIBITED);

Here the searchquery is   ISSUE_TITLE:test , fields is the array of fields 
in which ISSUE_TITLE is one of the fields and analyzer is 
WhitespaceAnalyzerMaestro() (already mentioned above).

When I run the search, the masterQuery I get after running the above code 
snippet has the following value: 
+(ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* ISSUE_TITLE:test* 
ISSUE_TITLE:test* ISSUE_TITLE:test*)

which I think is not correct. Is the MultiFieldQueryParser not supporting 
WhiteSpaceAnalyzer?

Please help.

Regards
Krishnendra Nandi

 
The information contained in this e-mail and any accompanying documents may contain information
that is confidential or otherwise protected from disclosure. If you are not the intended recipient
of this message, or if this message has been addressed to you in error, please immediately
alert the sender by reply e-mail and then delete this message, including any attachments.
Any dissemination, distribution or other use of the contents of this message by anyone other
than the intended recipient 
is strictly prohibited.



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message