lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuval Kesten <>
Subject RE: Indexing with Semantics
Date Thu, 03 May 2012 08:59:07 GMT
The logic you are looking for is Lemmatization -
I don't think Lucene has a built-in lemmatizer but you can use GATE which is an open source


-----Original Message-----
From: Kasun Perera [] 
Sent: Saturday, April 28, 2012 6:03 AM
Subject: Indexing with Semantics

I'm using Lucene's Term Freq vector to calculate cosine similarity between documents, Say
my docments has these 3 terms, "owe" "owed" "owing". Lucene takes this as 3 separate terms,
but 3 of them means same "owe". Is there any functionality in Lucene that can be used to index
by semantics? so that it indexes "owe" "owed" "owing" as one word "owe" with term frequency
=3 ?

If not I'd welcome any suggestions achieving this task?


Kasun Perera

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message