lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Donna L Gresh <gr...@us.ibm.com>
Subject MoreLikeThis and stopword stemming
Date Wed, 10 Oct 2007 17:52:55 GMT
What is the appropriate way of achieving both stopwords and stemming of 
stopwords when the MoreLikeThis class is used? My analyzer 
(MoreLikeThis.setAnalyzer) uses the Snowball filter, and is initialized 
with a stopwords set:

analyzer = new StandardAnalyzer(stopwords) {
             public TokenStream tokenStream(String fieldName, 
java.io.Reader reader) {
             return new SnowballFilter(super
.tokenStream(fieldName,reader),
             "English");
             }
};



If I do NOTsupply a separate stopwords list to the MoreLikeThis object 
(that is, using MoreLikeThis.setStopWords), will "the right thing" happen; 
that is, will my input text to the MoreLikeThis object be stemmed and 
(stemmed) stopwords removed before a query is formed? It seems that 
MoreLikeThis.setStopWords uses a simple lookup of words in the stop words 
list (no stemming) which is not what I want.

Thanks in advance
Donna


Donna L. Gresh
Services Research, Mathematical Sciences Department
IBM T.J. Watson Research Center
(914) 945-2472
http://www.research.ibm.com/people/g/donnagresh
gresh@us.ibm.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message