lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Galambos <galam...@com-os2.ms.mff.cuni.cz>
Subject RE: language identifier contrib
Date Mon, 13 Jan 2003 23:02:24 GMT
On Mon, 13 Jan 2003, Neil Couture wrote:

> I think that the main point is to not lose information and with stemmmer
> you do so because if you look at a stemmers as a mathematical function its
> obvious that theses are surjective but not bijective function. 
> If you loose information  then this will have impact on your precision and recall.

How can you loose something when you cannot use it? How can you calculate
similarity for two terms (in a document), i.e. came/ and coming/, when an
user asks for come/? How can you implement it without Neural network
model?

If you also use methods of ``cognitive science'', then lemmatizer is
better, of course...

> Needless to say that stemers can be not so bad on language like
> English which do not have a very complex morphology. But this is
> not the case for say french. 

It may improve recall, but precission would go down if you do not use the
cognitive science IMHO. Light stemmers would be better then.

-g-


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message