lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jon Crowell" <jcrow...@dsg.harvard.edu>
Subject RE: misspelled queries
Date Thu, 26 Jun 2003 20:08:09 GMT
GSpell is an open source java spell checking API.  It can be found at
http://umlslex.nlm.nih.gov/nlsRepository/gspell/doc/userDoc/

It incorporates both metaphone (which is similar to SoundEx, I think) and
ngram algorithms and it is easy to use.

I currently have an application in which a user submits a query to Lucene
and along the way I use GSpell to check all the terms in the query.  If any
are misspelled I underline with a squiggly red line and provide spelling
suggestions from GSpell if the user right-clicks.

If your spelling correction dictionary is exactly equal to the terms in your
index then any misspelled word is also guaranteed not to yield any hits, and
any indexed term is guaranteed not to turn up incorrectly spelled.

Jon



> -----Original Message-----
> From: Brian Mila [mailto:bmila@iastate.edu] 
> Sent: Thursday, June 26, 2003 3:54 PM
> To: lucene-user@jakarta.apache.org
> Subject: misspelled queries
> 
> 
> Hi,
> 
> I've been thinking about trying to implement a misspelled or 
> a similarity match, ala googles "did you mean this ....".  I 
> was thinking of using SoundEx or one of the newer algorithms 
> to find appropriate suggestions.  To do this though I think I 
> would need to enumerate every term in the index,
> not a pratical solution I suppose.   Has anyone else 
> attempted this or had
> any success with this idea?
> 
>  My only other idea would be to generate the SoundEx codes 
> for every term as its indexed and then add those codes to the 
> index in a different field. (fyi, here's a link that explains 
> SoundEx with example code:  
> http://www.codeproject.com/csharp/soundex.asp?target=soundex).

Then the query would search the regular fields and then form a second
soundex'd query and run it on the soundex field.  Does this sound plausible?
I'd be really interested to hear results if anyone has tried this before.

Regards,
Brian






---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message