lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: Phrase query and porter stemmer
Date Thu, 13 Feb 2003 17:56:46 GMT
Mailing Lists Account wrote:
> Doug Cutting wrote:
>>That's because Google and most internet search engines never do any
>>stemming.
> 
> Generally speaking, are there any advantages not to apply the stemmer ?
> Except for certain keywords,I found use of stemmers helpful.

Generally speaking, stemmers increase recall but decrease precision. 
Different stems of a word frequently have slightly different meanings, 
and are used in different contexts: hence the loss of precision. 
Internet search engines are not interested in increasing recall (there 
are usually plenty of matches) rather their problem is increasing 
precision (finding the best matches).  For example, there are enought 
sites explicitly about "cars" that there's no need to conflate these 
with sites that are also about "car".

However, with smaller collections, recall can be a problem and stemming 
can be useful.  A higher percentage of false positives is returned with 
stemming (decreased precision) but in a small collection that's 
acceptable if it also finds a few more relevant documents (increased 
recall).

I suspect the reason that internet search engines do not permit 
wildcards is simply a performance issue: wildcarded terms can be *much* 
more expensive to process, and internet search engines cannot afford them.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message