lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-lucene-u...@tropo.com>
Subject Re: combining open office spellchecker with Lucene
Date Thu, 09 Sep 2004 18:10:05 GMT
Doug Cutting wrote:

> Aad Nales wrote:
> 
>> Before I start reinventing wheels I would like to do a short check to
>> see if anybody else has already tried this. A customer has requested us
>> to look into the possibility to perform a spell check on queries. So far
>> the most promising way of doing this seems to be to create an Analyzer
>> based on the spellchecker of OpenOffice. My question is: "has anybody
>> tried this before?" 
> 
> 
> Note that a spell checker used with a search engine should use 
> collection frequency information.  That's to say, only "corrections" 
> which are more frequent in the collection than what the user entered 
> should be displayed.  Frequency information can also be used when 
> constructing the checker.  For example, one need never consider 
> proposing terms that occur in very few documents.  And one should not 
> try correction at all for terms which occur in a large proportion of the 
> collection.

Good heuristics but are there any more precise, standard guidelines as 
to how to balance or combine what I think are the following possible 
criteria in suggesting a better choice:

- ignore(penalize?) terms that are rare
- ignore(penalize?) terms that are common
- terms that are closer (string distance) to the term entered are better
- terms that start w/ the same 'n' chars as the users term are better





> 
> Doug
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message