lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From julien Blaize <julien.bla...@gmail.com>
Subject How to use Hunspell dictionary to do the reverse of stemming ?
Date Tue, 24 Oct 2017 15:04:33 GMT
Hello,

i am lookingfor a way to efficiently do the reverse of stemming.
Example : if i give to the program the verb "drug" it will give me
"drugged', "drugging", "drugs", "drugstore" etc...

I have used the program wordforms from hunspell to generate all possibles
combinations of the input word (even all the ridiculous one's that does not
match a real word). The i use org.apache.lucene.analysis.hunspell.Dictionary
class to check if the word exists and map to the original word.
This is really long and not efficient.

I was looking at the internals of the Dictionary class and saw the use of
patterns and FST (finite state machine). This seems a very efficient way to
check for the stem of a word. But i was unable to find a way to do the
reverse operation.

I am wondering if anyone has tried to do something similar ? Can someone
who understand FST and the usage of patterns in the Dictionary class give
me hints of wether what i am trying to do is possible and will be efficient
?

Kind Regards.

--
Julien Blaize

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message