lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Boyd <andrew.b...@mindspring.com>
Subject Re: de pluralization
Date Fri, 05 Aug 2005 13:45:14 GMT
You might want to look at stemming for "de pluralization"  it boils down words to their "root"

So bombs and bomming get stemmed to bomb.

I'm using the snowball stemmer, which handles different languages as well as english.
It is in the sandbox.  
org.apache.lucene.analysis.snowball.SnowballFilter;

Hope this helps,

Andrew

-----Original Message-----
From: Dan Armbrust <daniel.armbrust.list@gmail.com>
Sent: Aug 5, 2005 8:25 AM
To: java-user@lucene.apache.org
Subject: Re: de pluralization

Mufaddal Khumri wrote:

>Are there
>analyzers that do this already?
>
>  
>
Its not an analyzer, but the "norm" feature of this tool does a good job 
at getting to the normalized form of the words...

http://umlslex.nlm.nih.gov/lvg/current/

http://umlslex.nlm.nih.gov/lvg/current/docs/userDoc/norm.html

Creating an analyzer from it is fairly straightforward.


-- 
****************************
Daniel Armbrust
Biomedical Informatics
Mayo Clinic Rochester
daniel.armbrust(at)mayo.edu
http://informatics.mayo.edu/


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message