lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harini Raghavan" <harini.ragha...@insideview.com>
Subject Query Analyzer Issue
Date Fri, 31 Aug 2007 17:40:16 GMT
Hi Everyone,

I am facing some strange behaviour with Analyzers. I am using SimpleAnalyzer
for some fields in my Compass entity, but I also wrote a custom Analyzer
that is slightly different from the SimpleAnalyzer as I wanted to allow even
letters and digits in company name column.
So custom analyzer CompanyNameAnalyzer uses the following tokenizer.

public class CompanyNameTokenizer extends CharTokenizer {
  /** Construct a new CompanyNameTokenizer. */
  public CompanyNameTokenizer(Reader in) {
    super(in);
  }

  protected char normalize(char c) {
    return Character.toLowerCase(c);
  }

  /** Collects only characters which are numbers or letters */
  protected boolean isTokenChar(char c) {
    Character ch = new Character(c);
    // Exclude @ special character while tokenizing
    if(Character.isLetterOrDigit(c) || ch.equals('@'))
        return true;
    else
        return false;
  }
}

But for some reason when I search for +companyName:good +companyName:will,
the word 'will' is ignored in the search, I get results that match only
good. I guess this means that 'will' is being stripped off as it is a stop
word. I don't understand why this should happen because the custom Analyzer
I am using does not use the StopFilter. So why should it filter the stop
words?

I tried looking at the Lucene source too, but no luck. Any suggestions would
be appreciated.

Thanks,
Harini

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message