lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: SpellChecker Component
Date Wed, 12 Nov 2008 14:48:18 GMT
See https://issues.apache.org/jira/browse/LUCENE-1417 and http://lucene.markmail.org/message/sktohlgqxcpmpf7z?q=list:org%2Eapache%2Elucene%2Esolr-user+spellchecker+Rennie

In short, frequency is the second order sort level.  I think it should  
be made pluggable.    A patch would be most welcome.  I don't have  
time to produce one at the moment, but can shepherd it through.

FWIW, you might also try the Jaro-Winkler (JW) distance as the  
default.  Edit distance is not as good, since it treats differences  
the same no matter where in the word they occur, whereas most people  
tend to make spelling mistakes later on in a word, which I believe JW  
takes into account when scoring.

On Nov 11, 2008, at 11:52 AM, Jeff Newburn wrote:

> Ok.  I have managed to get the search component added (You rock  
> Grant).  I
> am having some interesting issues now with the suggestions.  We sell  
> shoes
> online so I am trying to get it to spellcheck for brand name.
>
> When I search konverse with spelling on it returns converse correctly
> however when I search nice (instead of nike) I am returned all sorts  
> of
> results not sorted by frequency.  I have even turned on  
> onlyMorePopular but
> it still is returning all of the different words in no order.  Nike  
> is by
> far the most frequent term how do I get it to the top?
>
> I am currently using the svn build of solr1.4.  I have included the
> configuration as well as the resultset return for spelling  
> suggestions.
>
>
> Below is the configuration:
>  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>
>    <!--<str name="queryAnalyzerFieldType">textSpell</str>-->
>    <str name="buildOnCommit">true</str>
>
>    <lst name="spellchecker">
>      <str name="name">default</str>
>      <str name="classname">solr.IndexBasedSpellChecker</str>
>      <str name="field">word</str>
>      <str name="spellcheckIndexDir">./spellchecker1</str>
>      <str name="accuracy">0.5</str>
>    </lst>
>    <lst name="spellchecker">
>      <str name="name">jarowinkler</str>
>      <str name="field">word</str>
>      <!-- Use a different Distance Measure -->
>      <str
> name 
> = 
> "distanceMeasure 
> ">org.apache.lucene.search.spell.JaroWinklerDistance</s
> tr>
>      <str name="spellcheckIndexDir">./spellchecker2</str>
>
>    </lst>
>
>    <lst name="spellchecker">
>      <str name="classname">solr.FileBasedSpellChecker</str>
>      <str name="name">file</str>
>      <str name="sourceLocation">spellings.txt</str>
>      <str name="characterEncoding">UTF-8</str>
>      <str name="indexDir">./spellcheckerFile</str>
>    </lst>
>  </searchComponent>
>
> Return results:
> <lst name="spellcheck">
> ?
> <lst name="suggestions">
> ?
> <lst name="nice">
> <int name="numFound">20</int>
> <int name="startOffset">0</int>
> <int name="endOffset">4</int>
> <int name="origFreq">0</int>
> ?
> <lst name="suggestion">
> <int name="frequency">47</int>
> <str name="word">Mice</str>
> </lst>
> ?
> <lst name="suggestion">
> <int name="frequency">26</int>
> <str name="word">Vice</str>
> </lst>
> ?
> <lst name="suggestion">
> <int name="frequency">14</int>
> <str name="word">Nice</str>
> </lst>
> ?
> <lst name="suggestion">
> <int name="frequency">4</int>
> <str name="word">Bice</str>
> </lst>
> ?
> <lst name="suggestion">
> <int name="frequency">1</int>
> <str name="word">Dice</str>
> </lst>
> ?
> <lst name="suggestion">
> <int name="frequency">4099</int>
> <str name="word">Nike</str>
> </lst>
>
>
> On 11/11/08 4:39 AM, "Grant Ingersoll" <gsingers@apache.org> wrote:
>
>> Hi Jeff,
>>
>> A SearchComponent allows you to connect functionality with any  
>> Request
>> Handler, allowing you to inline spelling requests (or other things
>> like MoreLikeThis) with your queries, saving you from having to make
>> an extra request.
>>
>> I walk through a lot of this in my article on Solr 1.3 for IBM
>> devWorks:
>> http://www.ibm.com/developerworks/java/library/j-solr-update/?S_TACT=105AGX01&
>> S_CMP=HP
>>
>> You can also refer to the Wiki at:
>> http://wiki.apache.org/solr/SearchComponent
>> and specifically:
>> http://wiki.apache.org/solr/SpellCheckComponent
>>
>> It works independently from the query parser (i.e. dismax).
>>
>> -Grant
>>
>>
>> On Nov 10, 2008, at 7:00 PM, Jeff Newburn wrote:
>>
>>> I am still relatively new to solr.  I have gotten the
>>> spellcheckerrequesthandler working the way I would like.  Now I am
>>> diving
>>> into the search component version of the spell checker.  I was  
>>> hoping
>>> someone could help explain 1. What specifically does the
>>> searchcomponent
>>> offer and how would I go about putting it into all search terms with
>>> the
>>> dismax type.
>>>
>>> -Jeff
>>
>> --------------------------
>> Grant Ingersoll
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











Mime
View raw message