lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pete Lewis" <p...@uptima.co.uk>
Subject Re: Stemming Oddness
Date Sat, 06 Nov 2004 10:10:04 GMT
Hi Yousef

You are not doing anything wrong - its just how the Porter stemmer works!

The problem with Porter is that it tries to do everything in a purely algorithmic way - which
doesn't cater for irregular conjugations etc.

Don't worry too much though, as long as you do the same stemming on the query string as you
did while indexing - you should be able to find what you are looking for but can have some
issues with trailing wildcards.....

If you want a better stemmer, look for something that has a dictionary as well as algorithmic
rules - a quick one that is readily available is Kstem which while not perfect I think is
quite a bit better than Porter.

You can get the source code (Kstem.jar) from the floowing website:

http://ciir.cs.umass.edu/downloads/

For more info on Kstem see the paper by its designer Bob Krovetz at:

http://ciir.cs.umass.edu/pubfiles/ir-35.pdf

Cheers

Pete


----- Original Message ----- 
From: "Yousef Ourabi" <yousef_ourabi@yahoo.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Saturday, November 06, 2004 1:13 AM
Subject: Stemming Oddness


> Hey,
> Thanks for everyone's reply to my last post, I have
> some quesiton. I imported the PorterStemmer and when I
> did the following
> 
> PorterStemmer ps = new PorterStemmer();
> string r1 = ps.stem("elephant");
> r1 is 'eleph'
> 
> also buying stems to bui, is this normal? Am I doing
> something wrong.
> 
> I am calling reset inbetween function calls.
> 
> Thanks,
> Yousef
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message