lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Solr 1.3 and spellcheck.onlyMorePopular=true
Date Thu, 29 Jan 2009 19:44:44 GMT
I am not super familiar with the lucene/solr spell checking 
implementations, but here is my take:

By saying to only allow more popular, you are restricting suggestions to 
only those that have a higher instance frequency in the index. The score 
is still by edit distance, but only terms with a higher frequency than 
the term passed will be suggested. I agree this odd - it means you 
should only pass words in that you know are misspelled. You cant count 
on the spellchecker to kind of do that for you as it does without the 
more popular setting on.

So that is leaving you with a nasty suggestion. But it looks like the 
edit distance for that suggestion is larger. What you might try is 
adjusting the threshold (the min edit distance) to be a bit higher. That 
may  restrict  that suggestion. Its not a great solution though. Its 
likely to suggest something else :) Ideally, the spell checker should 
probably be better at not suggesting when you have chosen a good word. 
It doesn't care you have a good  word already - it sees another word 
with greater frequency and within the edit distance allowed.

If you don't set the more popular setting, upon finding a word in the 
index, the Spell checker returns the word passed in. With the more 
popular setting on, you get the results you see - its still suggests, 
but it specifically will not suggest the word you passed in itself (the 
comment says, 'that would be silly'). So you will likely see bad 
suggestions for correct words with this setting.

- Mark

Nicholas Piasecki wrote:
> Hello All,
>
> I'm new to Solr, so forgive me if I'm overlooking something obvious. My
> observation is that the spellcheck.onlyMorePopular property of the
> SpellCheckComponent seems to not do what I expect.
>
> If I send the query "calvin klien" to my data store, then the spell
> checker correctly suggests "klein" for "klien," and running the new
> "calvin klein" query returns the expected many product results.
>
> However, when sending the correct query of "calvin klein," the spell
> checker will suggest "cin2" (another brand name in our data store) for
> "klein," and running that new "calvin cin2" collated query obviously
> returns zero results.
>
> It would seem to me that the "onlyMorePopular" property, when set to
> true, only performs its calculation of popularity on the particular
> misspelled word alone, and not the query as a whole. Since there are
> indeed more C-IN2 brand products in our database, it returns "cin2" has
> a spelling correction for "klein," seeing that the "cin2" token alone
> returns many results but not bothering to check that "calvin cin2"
> returns none. 
>
> A less astonishing behavior would be for it to suggest "cin2", test to
> see how many hits "calvin cin2" returns, see that it returns less than
> "calvin klein", and then exclude that suggestion because it is not more
> popular in the context of the original query.
>
> So:
>
> 1 - Is my analysis correct? Is this really how it works?
>
> 2 - Is there a configuration setting that I can do to make the spell
> checker use the desired behavior? Or, should I just immediately submit a
> request with its correlated suggestion with zero rows and do a
> comparison on the results, effectively performing the "onlyMorePopular"
> calculation myself?
>
> Many thanks; so far, Solr is proving to be an excellent product!
>
> V/R,
> Nicholas Piasecki
>
> Software Developer
> Skiviez, Inc.
> 1-800-628-1693 x6003
> nick@skiviez.com
>
>
>   


Mime
View raw message