lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kelvin Tan <>
Subject Re: sounds like spellcheck
Date Wed, 09 Feb 2005 13:03:24 GMT
Hey Aad, I believe has a link to Phonetix
(, an LGPL-licensed
lib for phonetic algorithms like Soundex, Metaphone and DoubleMetaphone. There are Lucene

As to the suitability of the algorithms, I haven't taken a look at the Phonetix implementation,
but if is anything to go
by (do a search for "dutch"), then it should meet your needs, or at least won't be difficult
to customize. 

Is that what you're looking for?


On Wed, 09 Feb 2005 13:23:57 +0100, Aad Nales wrote:
> In my Clipper days I could build an index on English words using a
> technique that was called soundex. Searching in that index resulted
> in hits of words that sounded the same. From what i remember this
> technique only worked for English. Has it ever been generalized?
> What i am trying to solve is this. A customer is looking for a
> solution to spelling mistakes made by children (upto 10) when
> typing in queries. The site is Dutch. Common mistakes are 'sgool'
> when searching for 'school'. The 'normal' spellcheckers and
> suggestors typically generate a list where the 'sounds like'
> candidates' are too far away from the result. So what I am thinking
> about doing is this:
> 1. create a parser that takes a word and creates a soundindex entry.
> 2. create list of 'correctly' spelled words either based on the
> index of the website or on some kind of dictionary.
> 2a. perhaps create a n-gram index based on these words
> 3. accept a query, figure out that a spelling mistake has been made
> 3a find alternatives by parsing the query and searching the 'sound
> like index' and then calculate and order  the results
> Steps 2 and 3 have been discussed at length in this forum and have
> even made it to the sandbox. What I am left with is 1.
> My thinking is processing a series of replacement statements that
> go like: --
> g sounds like ch if the immediate predecessor is an s. o sounds
> like oo if the immediate predecessor is a consonant --
> But before I takes this to the next step I am wondering if anybody
> has created or thought up alternative solutions?
> Cheers,
> Aad
> --------------------------------------------------------------------
> - To unsubscribe, e-mail: lucene-user-
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message