lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Newbie: Human Readable Stemming, Lucene Architecture, etc!
Date Fri, 21 Jan 2005 09:38:20 GMT
Morus Walter wrote:
> Owen Densmore writes:
> 
> 
>>1 - I'm a bit concerned that reasonable stemming (Porter/Snowball) 
>>apparently produces non-word stems .. i.e. not really human readable.  
>>(Example: generate, generates, generated, generating  -> generat) 
>>Although in typical queries this is not important because the result of 
>>the search is a document list, it *would* be important if we use the 
>>stems within a graphical navigation interface.
>>     So the question is: Is there a way to have the stemmer produce 
>>english
>>     base forms of the words being stemmed?
>>
> 
> rule based stemmers such as porter/snowball cannot do that.
> But there are (commercial) dictionary based tools that can. E.g. the
> canoo lemmatizer.
> You might also have a look at egothors stemmer, that are word list based.

Egothor stemmers are algorithmic, they only use word lists for training. 
Stems produced by them are usually closer to lemmas than in e.g. 
Porter's stemmer, but there is a significant amount of stems like in the 
example above.




-- 
Best regards,
Andrzej Bialecki
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message