lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject RE: fetching similar wordlist as given word
Date Wed, 24 Nov 2004 08:53:52 GMT

:   > can I get the similar wordlist as output. so that I can show the end
:   > user in the column  ---------------   do you mean "foam"?
:   > How can I get similar word list in the given content?

This is a non trivial problem, because the definition of "similar" is
subject to interpretation.  I would look into various dictionary
implimentations, and see if you can find a good Java based dictionary that
can suggest alternatives based on an input string.

Once you have that, then you should be able to use IndexSearcher.docFreq
to find out how many docs contains each alternate word, and compare that
with the number of docs that contain the initial word ... if one of the
alternates has a significantly higher number of matches, then you suggest

NOTE: The DICT protocol defines a client/server approach to providing
spell correction and definitions.  Maybe you can leverage some of the
spell correction code mentioned in the "Server Software Written in Java"
section of this doc...
In particular, you might want to take a look at JavaDict's Database.match
function using the LevenshteinStrategy...,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message