lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Plural Stemming
Date Sat, 02 Apr 2005 00:14:35 GMT

On Apr 1, 2005, at 7:03 PM, Chris Hostetter wrote:

>
> : > > Are there any Lucene extensions that can do simple stemming, 
> i.e. just
> : > > for plurals? Or is the only stemming package available Snowball?
>
> LIA has a case study of jGuru which uses a very specific, home grown
> utility method called "stripEnglishPlural" ... since it's in the case
> study chapter, i'm not sure if it's included in the books source code, 
> but
> is included verbatim in the book...
>
>    http://lucenebook.com/search?query=stripEnglishPlural

Thanks for the reminder, Chris.  I'm sure jGuru wouldn't mind us 
posting it, so I've pasted it below.  It is not included in the LIA 
source code - only the code Otis and I wrote ourselves is included 
there and we didn't get the source code from any of the case studies 
(other than Bob Carpenter's LingPipe stuff).

	Erik


/** A useful, but not particularly efficient plural stripper */
public static String stripEnglishPlural(String word) {
     // too small?
     if ( word.length()<STRIP_PLURAL_MIN_WORD_SIZE ) {
       return word;
     }
     // special cases
     if ( word.equals("has") ||
        word.equals("was") ||
        word.equals("does") ||
        word.equals("goes") ||
        word.equals("dies") ||
        word.equals("yes") ||
        word.equals("gets") || // means too much in java/JSP
        word.equals("its") )
       {
           return word;
       }
     String newWord=word;
     if ( word.endsWith("sses") ||
          word.endsWith("xes") ||
          word.endsWith("hes") ) {
       // remove 'es'
       newWord = word.substring(0,word.length()-2);
     }
     else if ( word.endsWith("ies") ) {
       // remove 'ies', replace with 'y'
       newWord = word.substring(0,word.length()-3)+'y';
     }
     else if ( word.endsWith("s") &&
              !word.endsWith("ss") &&
              !word.endsWith("is") &&
              !word.endsWith("us") &&
              !word.endsWith("pos") &&
              !word.endsWith("ses") ) {
       // remove 's'
       newWord = word.substring(0,word.length()-1);
     }
     return newWord;
}


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message