lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hager <jha...@gmail.com>
Subject Re: Spell checker
Date Wed, 20 Oct 2004 22:47:32 GMT
I investigated how the algorithm implemented in this spell checker
compares with my simple implementation of a spell checker.

First here is what my implementation looks like:

//Each word becomes a single Lucene Document

//To find suggestions:
   FuzzyQuery fquery = new FuzzyQuery(new Term("word", word));
   Hits dicthits = dictionarySearcher.search(fquery);

For a simple test I misspelled brown, as follows:
 * bronw
 * bruwn
 * brownz

To validate my testcases I checked if Microsoft Word and Google had
any idea what I was trying to spell.  Google suggested brown, brown,
browns, respectively.

Words suggestions were:

bronw==>brown, brow
bruwn==>brown, brawn, bruin
brownz==>browns, brown

The suggestions using  David Spencer/Nicolas Maisonneuve's algorithm
against my index were:

bronw==>jaron, brooks, citron, brookline
bruwn==>brush
brownz==>bronze, brooks, brooke, brookline


The suggestions using my real simple algorithm against my index were:

bronw==>brown, brwn, brush
bruwn==>brown, brwn, brush
brownz==>brown, bronze

It appears that  David Spencer/Nicolas Maisonneuve's Spell Checking
Algorithm returns a broader result set than most commercial algorithms
or a real simple algorithm.  I will be the first to say, that this is
just anecdotal evidence and not a rigourous test of either algorithm. 
But until extensive testing has been done I'm going to stick with my
real simple dictionary lookup.

Jonathan

On Wed, 20 Oct 2004 12:56:39 -0400, Aviran <amordo@infosciences.com> wrote:
> Here http://issues.apache.org/bugzilla/showattachment.cgi?attach_id=13009
> 
> Aviran
> http://aviran.mordos.com
> 
> 
> 
> -----Original Message-----
> From: Lynn Li [mailto:lli@hoovers.com]
> Sent: Wednesday, October 20, 2004 10:52 AM
> To: 'Lucene Users List'
> Subject: RE: Spell checker
> 
> Where can I download it?
> 
> Thanks,
> Lynn
> 
> -----Original Message-----
> From: Nicolas Maisonneuve [mailto:nico.maisonneuve@free.fr]
> Sent: Monday, October 11, 2004 1:26 PM
> To: Lucene Users List
> Subject: Spell checker
> 
> hy lucene users
> i developed a Spell checker for lucene inspired by the David Spencer code
> 
> see the wiki doc: http://wiki.apache.org/jakarta-lucene/SpellChecker
> 
> Nicolas Maisonneuve
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message