lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yilmazel, Sibel" <syilma...@navisite.com>
Subject Stemmer algorithms
Date Mon, 13 Feb 2006 18:41:52 GMT
Hello all,

We have done some preliminary research on Porter2 and K-stem algorithms
and have some questions.

Porter2 was found to be a 'strong' stemming algorithm where it strips
off both inflectional suffixes (-s, -es, -ed) and derivational suffixes
(-able, -aciousness, -ability). K-Stem seemed to be a weak stemming
algorithm as it strips off only the inflectional suffixes (-s, -es,
-ed).

In IR, it is usually recommended using a "weak" stemmer, as the "weak"
stemmer seldom hurts performance, but it usually provides significant
improvement with precision.

However, Porter2 is the most widely used stemming algorithm AND it is a
'strong' stemmer which is contrary to what is said above. 

Can you share your ideas, experiences with stemmer algorithms? Thanks in
advance.

Sibel

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message