lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pete Lewis" <>
Subject Re: Stemming Oddness
Date Sat, 06 Nov 2004 10:10:04 GMT
Hi Yousef

You are not doing anything wrong - its just how the Porter stemmer works!

The problem with Porter is that it tries to do everything in a purely algorithmic way - which
doesn't cater for irregular conjugations etc.

Don't worry too much though, as long as you do the same stemming on the query string as you
did while indexing - you should be able to find what you are looking for but can have some
issues with trailing wildcards.....

If you want a better stemmer, look for something that has a dictionary as well as algorithmic
rules - a quick one that is readily available is Kstem which while not perfect I think is
quite a bit better than Porter.

You can get the source code (Kstem.jar) from the floowing website:

For more info on Kstem see the paper by its designer Bob Krovetz at:



----- Original Message ----- 
From: "Yousef Ourabi" <>
To: "Lucene Users List" <>
Sent: Saturday, November 06, 2004 1:13 AM
Subject: Stemming Oddness

> Hey,
> Thanks for everyone's reply to my last post, I have
> some quesiton. I imported the PorterStemmer and when I
> did the following
> PorterStemmer ps = new PorterStemmer();
> string r1 = ps.stem("elephant");
> r1 is 'eleph'
> also buying stems to bui, is this normal? Am I doing
> something wrong.
> I am calling reset inbetween function calls.
> Thanks,
> Yousef
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message