lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dyer, James" <James.D...@ingramcontent.com>
Subject RE: Solr Spellcheck suggestions only return from /select handler when returning search results
Date Thu, 11 Sep 2014 12:58:17 GMT
Thomas,

Yes, you are right about the problem being with the beginning of the word needing correction.
 If you are using DirectSolrSpellChecker, you need to set the "minPrefix" parameter to 0.
 Otherwise the default (1) requires the first character to match for it to try and correct
it.

See http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29

James Dyer
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Thomas Michael Engelke [mailto:thomas.engelke@posteo.de] 
Sent: Thursday, September 11, 2014 3:46 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr Spellcheck suggestions only return from /select handler when returning search
results

 Hi James, hi list,

I can confirm the existence of data that's within
1 Levenshtein step from "ichtscheiben":

{
 "responseHeader": {

"status": 0,
 "QTime": 0,
 "params": {
 "fl": "name,spell",
 "indent":
"true",
 "q": "name:Sichtscheiben",
 "_": "1410423419758",
 "wt":
"json",
 "rows": "50"
 }
 },
 "response": {
 "numFound": 6,
 "start":
0,
 "docs": [
 {
 "name": "Sichtscheiben",
 "spell": "Sichtscheiben"

},
 {
 "name": "Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {

"name": "Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {
 "name":
"Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {
 "name":
"Sichtscheiben",
 "spell": "Sichtscheiben"
 },
 {
 "name":
"Sichtscheiben",
 "spell": "Sichtscheiben"
 }
 ]
 }
}

Multiple records
exist that should match.

The note for alternativeTermCount is
appreciated.

I've tried another term: "Transport". I get suggestions
when I use "Transpor" and "Transpo", even "Transpotr", but "ransport"
doesn't yield any suggestions. Maybe it's a question of the beginning of
a word and has not really anything to do with stemming.

Am 10.09.2014
15:19 schrieb Dyer, James: 

> Thomas,
> 
> It looks like you've set
things up correctly in that while the user is searching against a
stemmed field ("name"), spellcheck is checking against a
lightly-analyzed copy of it ("spell"). This is the right way to do it as
spellcheck against stemmed forms is usually undesirable.
> 
> But as
you've experienced, you will sometimes get results (due to stemming) and
also suggestions (because the spellechecker is looking at unstemmed
forms). If you do not want spellcheck to return anything when you get
results, you can set "spellcheck.maxResultsForSuggest=0".
> 
> Now
keeping in mind we're comparing unstemmed forms, can you verify you
indeed have something in your index that is within 2 edits of
"ichtscheiben" ? My guess is you probably don't, which would be why you
do not get spelling results in that case.
> 
> Also, even if you do have
something within 2 edits, if "ichtscheiben" occurs in your index, by
default it won't try to correct it at all (even if the query returns
nothing, maybe because of filters or other required terms on the query).
In this case you need to set "spellcheck.alternativeTermCount" to a
non-zero value (try maybe 5).
> 
> See
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount
[1] and following sections.
> 
> James Dyer
> Ingram Content Group
>
(615) 213-4311
> 
> -----Original Message-----
> From: Thomas Michael
Engelke [mailto:thomas.engelke@posteo.de] 
> Sent: Wednesday, September
10, 2014 5:00 AM
> To: Solr user
> Subject: Solr Spellcheck suggestions
only return from /select handler when returning search results
> 
>
Hi,
> 
> I'm experimenting with the Spellcheck component and have
therefor
> used the example configuration for spell checking to try
things out. My
> solrconfig.xml looks like this:
> 
> <searchComponent
name="spellcheck"
> class="solr.SpellCheckComponent">
> <str
>
name="queryAnalyzerFieldType">spell</str>
> <!-- Multiple "Spell
>
Checkers" can be declared and used by this
> component
> -->
> <!-- a
>
spellchecker built from a field of the main index -->
> <lst
>
name="spellchecker">
> <str name="name">default</str>
> <str
>
name="field">spell</str>
> <str
>
name="classname">solr.DirectSolrSpellChecker</str>
> <!-- the
spellcheck
> distance measure used, the default is the internal
levenshtein -->
> <str
> name="distanceMeasure">internal</str>
> <!--
uncomment this to require
> suggestions to occur in 1% of the
documents
> <float
> name="thresholdTokenFrequency">.01</float>
> -->
>
</lst>
> <!-- a
> spellchecker that can break or combine words. See
"/spell" handler below
> for usage -->
> <lst name="spellchecker">
>
<str
> name="name">wordbreak</str>
> <str
>
name="classname">solr.WordBreakSolrSpellChecker</str>
> <str
>
name="field">spell</str>
> <str name="combineWords">true</str>
> <str
>
name="breakWords">true</str>
> <int name="maxChanges">10</int>
>
</lst>
> 
> </searchComponent>
> 
> And I've added the spellcheck
component to my
> /select request handler:
> 
> <requestHandler
name="/select"
> class="solr.SearchHandler">
> ...
> <arr
name="last-components">
> 
> <str>spellcheck</str>
> </arr>
>
</requestHandler>
> 
> I have built up the
> spellchecker source in the
schema.xml from the name field:
> 
> <field
> name="spell" type="spell"
indexed="true" stored="true" required="false"
> multiValued="false"/>
>
<copyField source="name" dest="spell"
> maxChars="30000" />
> ...
>
<fieldType name="spell" class="solr.TextField"
>
positionIncrementGap="100">
> <analyzer type="index">
> <tokenizer
>
class="solr.StandardTokenizerFactory"/>
> </analyzer>
> <analyzer
>
type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> 
>
</analyzer>
> </fieldType>
> 
> As I'm querying the /select request
handler,
> I should get spellcheck suggestions with my results. However,
I rarely
> get a suggestion. Examples:
> 
> query: Sichtscheibe,
spellcheck suggestion:
> Sichtscheiben (works)
> query: Sichtscheib,
spellcheck suggestion:
> Sichtscheiben (works)
> query: ichtscheiben, no
spellcheck suggestions
> 
> As
> far as I can identify, I only get
suggestions when I get real search
> results. I get results for the
first 2 examples, because the german
> StemFilterFactory translates
"Sichtscheibe" and "Sichtscheiben" into
> "Sichtscheib", so there are
matches found. However, the third query
> should result in a suggestion,
as the Levenshtein distance is less than
> in the second example.
> 
>
Suggestions, improvements, corrections?

 

Links:
------
[1]
http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.alternativeTermCount
Mime
View raw message