lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: combining open office spellchecker with Lucene
Date Fri, 10 Sep 2004 03:09:16 GMT
David Spencer wrote:
> Good heuristics but are there any more precise, standard guidelines as 
> to how to balance or combine what I think are the following possible 
> criteria in suggesting a better choice:

Not that I know of.

> - ignore(penalize?) terms that are rare

I think this one is easy to threshold: ignore matching terms that are 
rarer than the term entered.

> - ignore(penalize?) terms that are common

This, in effect, falls out of the previous criterion.  A term that is 
very common will not have any matching terms that are more common.  As 
an optimization, you could avoid even looking for matching terms when a 
term is very common.

> - terms that are closer (string distance) to the term entered are better

This is the meaty one.

> - terms that start w/ the same 'n' chars as the users term are better

Perhaps.  Are folks really better at spelling the beginning of words?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message