lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Naber <lucenelist2...@danielnaber.de>
Subject Re: svn commit: r428998 - in /lucene/java/trunk: CHANGES.txt src/java/org/apache/lucene/analysis/StopAnalyzer.java src/test/org/apache/lucene/analysis/TestStandardAnalyzer.java
Date Sat, 05 Aug 2006 20:53:54 GMT
On Samstag 05 August 2006 22:31, Yonik Seeley wrote:

> Stop words and stemming always make literal searching less precise,
> with the general benefit of greater matching power (more general) and
> smaller index size.

That's why I gave the "t-online" example: it makes the search result look 
incorrect but hardly helps reduce index size. "t" and "s" were probably 
added so "don't" doesn't get indexed as "don", "t", but this doesn't 
happen anyway as the StandardTokenizer keeps "don't" as a single token. 
"'s" is cut off in StandardFilter.

In general, this is only a default list and people will need to adapt it 
anyway. So we should only add the words which are probably stopwords for 
most users.

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message