lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mufaddal Khumri" <MKhu...@allegromedical.com>
Subject hyphen not being removed by standard filter
Date Thu, 23 Feb 2006 06:10:17 GMT
Hi,

I might be missing something. I have a custom analyzer the gist of which is:

	public TokenStream tokenStream(String fieldName, Reader reader)
	{
		TokenStream result = new StandardTokenizer(reader);
		result = new StandardFilter(result);
		result = new LowerCaseFilter(result);
		result = new StopFilter(result, stopSet);
		result = new PorterStemFilter(result);
		return result;
	}

I test my above analyzer with the following query string:
the is EOS-20D canon amazing

In my test code I do this  to see what my analyzed query string looks like:
        
	        PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new StandardStemmingAnalyzer());
		analyzer.addAnalyzer("categoryNames", new KeywordAnalyzer());

		TokenStream stream = analyzer.tokenStream(null, new StringReader(queryString));
		String analyzedQueryString = "";
		
		while(true)
		{
			Token token = stream.next();
			if(token == null)
			{
				break;
			}
		
			analyzedQueryString = analyzedQueryString + token.termText() + " ";
		}
		
		analyzedQueryString = analyzedQueryString.trim();

		log.debug("analyzedQueryString = " + analyzedQueryString);

The output of the log statement above is:

analyzedQueryString = eos-20d canon amaz

I see that the common stop words have been removed, everything has been lower cased and even
the query has also been stemmed, why was the hyphen not removed by the standard filter???
Or does the standard analyzer remove hyphens only from phrases like "eos - 20d" and not from
"eos-20d" ?

Thanks.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message