lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "gekkokid" ...@gekkokid.org.uk>
Subject Re: What is stemming?
Date Sun, 20 Nov 2005 17:48:49 GMT
Hello fellow lucener :), firstly im no tutor but i will try my best to 
explain, if anyone believes im wrong please state it so our friend doesnt 
get the wrong idea,

here it goes. O_o
stemming is reducing the word to the root form, where lemmatisation is 
concerned with linguistics i believe

lemmatisation is "go", "gone", "going", "goes", "been" and "went"

where stemming a word would be reducing a word from "gone" to "go", so it 
can be matched to other stemmed words such as "going", as "going" stemmed 
would be "go" also,

a better example.
"engineering", "engineers", "engineered", "engineer"

these four words would not match up if they were tested for equality, 
however by stemming these words we can reduce them to a more basic form,

engineering --> engineer
engineers --> engineer
engineered --> engineer
engineer --> engineer

now we have stemmed words they will match for equality, so now if i try 
searching using the word for engineer, documents on engineering, engineers 
and engineered would be returned from a stemmed index/database.


I hope that helps. Do a search for "porter stemmer" for more information.

_gk

----- Original Message ----- 
From: "Koji Sekiguchi" <koji.sekiguchi@m4.dion.ne.jp>
To: <java-user@lucene.apache.org>
Sent: Sunday, November 20, 2005 3:48 PM
Subject: What is stemming?


> Hello, Luceners!
>
> What is "stemming"?
>
> I have Lucene in Action and found the following definitions on page 103:
>
> - reducing words to a root form (stemming)
> - changing words into the basic form (lemmatization)
>
> but I cannot see the difference between them.
> I'm also confused by the following words on page 71:
>
> - convert terms to base word forms (stemming)
>
> It seems "lemmatization" to me...
>
> Could someone explain what "stemming" is?
> Some illustrative cases would be very appreciated.
> My mother tongue is not English.
>
> Thank you,
>
> Koji
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message