lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuval Kesten <ykes...@yahoo-inc.com>
Subject RE: Indexing with Semantics
Date Thu, 03 May 2012 08:59:07 GMT
Hi,
The logic you are looking for is Lemmatization - http://en.wikipedia.org/wiki/Lemmatisation.
I don't think Lucene has a built-in lemmatizer but you can use GATE which is an open source
project:
http://gate.ac.uk
http://gate.ac.uk/gate/doc/plugins.html

Enjoy!



-----Original Message-----
From: Kasun Perera [mailto:kasunp@opensource.lk] 
Sent: Saturday, April 28, 2012 6:03 AM
To: java-user@lucene.apache.org
Subject: Indexing with Semantics

I'm using Lucene's Term Freq vector to calculate cosine similarity between documents, Say
my docments has these 3 terms, "owe" "owed" "owing". Lucene takes this as 3 separate terms,
but 3 of them means same "owe". Is there any functionality in Lucene that can be used to index
by semantics? so that it indexes "owe" "owed" "owing" as one word "owe" with term frequency
=3 ?

If not I'd welcome any suggestions achieving this task?

--
Regards

Kasun Perera

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message