lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dejan Caric <dejan.ca...@gmail.com>
Subject Re: Solr Suggester component doesn't return hits for non-English words
Date Wed, 13 Mar 2013 10:11:32 GMT
Thank you Carlos and sorry for late reply.
I've set the threshold to 0 and that did the trick.


Kind regards,

Dejan


On Tue, Feb 26, 2013 at 3:05 AM, Carlos Maroto <
CMaroto@searchtechnologies.com> wrote:

> Hi Dejan,
>
> I wouldn't say your problem is because the words are non-English words as
> there is nothing in Solr to indicate that the terms are or not in English.
>  I think it is a configuration thing in your implementation for the current
> data set or test, I would start by trying the following:
>
> - In the <searchComponent>, the <threshold> attribute may prevent either
> or both of your suggestions from being considered.  Make sure that "marcos"
> and "dejan" appear in at least 0.5% (per the 0.005 value in the parameter)
> of your document set.  If they don't, then that explains it: the suggester
> considers those too rare to be included as a suggestion.  Perhaps set it to
> 0 to find out if the suggester returns them then  (check a couple of
> references to "threshold" in the Suggester wiki article, particularly the
> details at http://wiki.apache.org/solr/Suggester#Dictionary )
> - If you still don't get them as suggestions but you get some new
> suggestions as a result of the new <threshold> value, then you may have a
> lot of other rare terms matching "mar" or "de" and you'd need to adjust
> other parameters, such as "spellcheck.count" in the <requestHandler> or
> others
>
> Additionally, check the your configurations in general.  For example, the
> <requestHandler> has "spellcheck.onlymorepopular" all in lowercase and Solr
> may ignore it (the correct name is "spellcheck.onlyMorePopular").  You may
> not care about it and  it shouldn't affect your current case but, it is
> better to reduce things to basics when troubleshooting something
> (remove/disable settings you don't need until you resolve the current issue)
>
> Hope this helps,
> Carlos
> www.searchtechnologies.com
>
> -----Original Message-----
> From: Dejan Caric [mailto:dejan.caric@gmail.com]
> Sent: Sunday, February 24, 2013 4:35 AM
> To: solr-user@lucene.apache.org
> Subject: Solr Suggester component doesn't return hits for non-English words
>
> Hi everyone,
>
> I have defined a suggest component like this:
>
> <searchComponent class="solr.SpellCheckComponent" name="suggest">
>     <lst name="spellchecker">
>         <str name="name">suggest</str>
>         <str
> name="classname">org.apache.solr.spelling.suggest.Suggester</str>
>         <str
> name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
>
>         <str name="field">autosuggest_general</str>
>         <float name="threshold">0.005</float>
>         <str name="buildOnCommit">true</str>
>     </lst>
> </searchComponent>
> <requestHandler class="org.apache.solr.handler.component.SearchHandler"
> name="/suggest">
>     <lst name="defaults">
>         <str name="spellcheck">true</str>
>         <str name="spellcheck.dictionary">suggest</str>
>         <str name="spellcheck.onlymorepopular">true</str>
>         <str name="spellcheck.count">5</str>
>         <str name="spellcheck.collate">true</str>
>     </lst>
>     <arr name="components">
>         <str>suggest</str>
>     </arr>
> </requestHandler>
>
> and autosuggest_general field like this:
>
> <field name="autosuggest_general" type="autosuggest_type" indexed="true"
> stored="true" multiValued="true" />
> <fieldType name="autosuggest_type" class="solr.TextField"
> positionIncrementGap="100">
>     <analyzer>
>         <charFilter class="solr.HTMLStripCharFilterFactory"/>
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>     </analyzer>
> </fieldType>
>
> The suggester component doesn't return any hits for non-English words.
>
> I want to get auto-complete for word `Marcos`.
> So when I call http://localhost:8983/solr/mycore/suggest?q=mar I get the
> following response:
>
> <response>
>     <lst name="responseHeader">
>         <int name="status">0</int>
>         <int name="QTime">2</int>
>     </lst>
>     <lst name="spellcheck">
>         <lst name="suggestions"/>
>     </lst>
> </response>
>
> And regular search returns 10 hits:
> http://localhost:8983/solr/mycore/select?q=autosuggest_general:marcos
>
> For `de` I get the following response:
>
> <response>
>     <lst name="responseHeader">
>         <int name="status">0</int>
>         <int name="QTime">1</int>
>     </lst>
>     <lst name="spellcheck">
>         <lst name="suggestions">
>             <lst name="de">
>                 <int name="numFound">3</int>
>                 <int name="startOffset">0</int>
>                 <int name="endOffset">2</int>
>                 <arr name="suggestion">
>                     <str>design</str>
>                     <str>developer</str>
>                     <str>development</str>
>                 </arr>
>             </lst>
>             <str name="collation">design</str>
>         </lst>
>     </lst>
> </response>
>
> `design`, `developer`, and `development` are fine but I don't get `dejan`
> in suggestions and that word does exist in autosuggest_general field.
>
> http://localhost:8983/solr/mycore/select?q=autosuggest_general:dejanreturns
>
> <response>
>     <lst name="responseHeader">
>         <int name="status">0</int>
>         <int name="QTime">1</int>
>         <lst name="params">
>             <str name="q">autosuggest_general:dejan</str>
>         </lst>
>     </lst>
>     <result name="response" numFound="7" start="0">
>     ...
>     </result>
> </response>
>
> I'm using Solr 4.1
>
> Any help would be greatly appreciated!
>
> // Dejan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message