lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: combining open office spellchecker with Lucene
Date Fri, 10 Sep 2004 03:09:16 GMT
David Spencer wrote:
> Good heuristics but are there any more precise, standard guidelines as 
> to how to balance or combine what I think are the following possible 
> criteria in suggesting a better choice:

Not that I know of.

> - ignore(penalize?) terms that are rare

I think this one is easy to threshold: ignore matching terms that are 
rarer than the term entered.

> - ignore(penalize?) terms that are common

This, in effect, falls out of the previous criterion.  A term that is 
very common will not have any matching terms that are more common.  As 
an optimization, you could avoid even looking for matching terms when a 
term is very common.

> - terms that are closer (string distance) to the term entered are better

This is the meaty one.

> - terms that start w/ the same 'n' chars as the users term are better

Perhaps.  Are folks really better at spelling the beginning of words?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message