lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kelvin Tan <kelvin-li...@relevanz.com>
Subject Re: sounds like spellcheck
Date Wed, 09 Feb 2005 13:03:24 GMT
Hey Aad, I believe http://jakarta.apache.org/lucene/docs/contributions.html has a link to Phonetix
(http://www.companywebstore.de/tangentum/mirror/en/products/phonetix/index.html), an LGPL-licensed
lib for phonetic algorithms like Soundex, Metaphone and DoubleMetaphone. There are Lucene
adapters.

As to the suitability of the algorithms, I haven't taken a look at the Phonetix implementation,
but if http://spottedtiger.tripod.com/D_Language/D_DoubleMetaPhone.html is anything to go
by (do a search for "dutch"), then it should meet your needs, or at least won't be difficult
to customize. 

Is that what you're looking for?

k

On Wed, 09 Feb 2005 13:23:57 +0100, Aad Nales wrote:
> In my Clipper days I could build an index on English words using a
> technique that was called soundex. Searching in that index resulted
> in hits of words that sounded the same. From what i remember this
> technique only worked for English. Has it ever been generalized?
>
> What i am trying to solve is this. A customer is looking for a
> solution to spelling mistakes made by children (upto 10) when
> typing in queries. The site is Dutch. Common mistakes are 'sgool'
> when searching for 'school'. The 'normal' spellcheckers and
> suggestors typically generate a list where the 'sounds like'
> candidates' are too far away from the result. So what I am thinking
> about doing is this:
>
> 1. create a parser that takes a word and creates a soundindex entry.
>
> 2. create list of 'correctly' spelled words either based on the
> index of the website or on some kind of dictionary.
> 2a. perhaps create a n-gram index based on these words
>
> 3. accept a query, figure out that a spelling mistake has been made
> 3a find alternatives by parsing the query and searching the 'sound
> like index' and then calculate and order  the results
>
> Steps 2 and 3 have been discussed at length in this forum and have
> even made it to the sandbox. What I am left with is 1.
>
> My thinking is processing a series of replacement statements that
> go like: --
> g sounds like ch if the immediate predecessor is an s. o sounds
> like oo if the immediate predecessor is a consonant --
>
> But before I takes this to the next step I am wondering if anybody
> has created or thought up alternative solutions?
>
> Cheers,
> Aad
>
>
> --------------------------------------------------------------------
> - To unsubscribe, e-mail: lucene-user-
> unsubscribe@jakarta.apache.org For additional commands, e-mail:
> lucene-user-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message