lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Spencer <dave-lucene-u...@tropo.com>
Subject frequent terms - Re: combining open office spellchecker with Lucene
Date Sat, 11 Sep 2004 00:38:46 GMT
Doug Cutting wrote:

> Aad Nales wrote:
> 
>> Before I start reinventing wheels I would like to do a short check to
>> see if anybody else has already tried this. A customer has requested us
>> to look into the possibility to perform a spell check on queries. So far
>> the most promising way of doing this seems to be to create an Analyzer
>> based on the spellchecker of OpenOffice. My question is: "has anybody
>> tried this before?" 
> 
> 
> Note that a spell checker used with a search engine should use 
> collection frequency information.  That's to say, only "corrections" 
> which are more frequent in the collection than what the user entered 
> should be displayed.  Frequency information can also be used when 
> constructing the checker.  For example, one need never consider 
> proposing terms that occur in very few documents.  


> And one should not 
> try correction at all for terms which occur in a large proportion of the 
> collection.

I keep thinking over this one and I don't understand it. If a user 
misspells a word and the "did you mean" spelling correction algorithm 
determines that a frequent term is a good suggestion, why not suggest 
it? The very fact that it's common could mean that it's more likely that 
the user wanted this word (well, the heuristic here is that users 
frequently search for frequent terms, which is probabably wrong, but 
anyway..).

I know in other contexts of IR frequent terms are penalized but in this 
context it seems that frequent terms should be fine...

-- Dave



> 
> Doug
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message