lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohan Thakur <rohan.i...@gmail.com>
Subject Re: how to avoid single character to get indexed for directspellchecker dictionary
Date Fri, 05 Apr 2013 10:45:14 GMT
hi james

I have tried using length filter factory as well but it seems that it is
removing the single character from the index but when I qeuery for delll it
is still giving dell l in suggestions this I think is due to querying the
term like dell l  solr can find the result as in it will tokenise dell and
l and will return the results with dell in the documents so to remove such
thing do I have to use minbreaklenth? and what is the significance of
minbreak length number?


On Fri, Apr 5, 2013 at 12:20 PM, Rohan Thakur <rohan.iitd@gmail.com> wrote:

> hi james
>
> after using this its working file for delll but not for dellll. what does
> this minbreaklength signifies?
>
>
> also can you tell me why am I not getting suggestions for smaller words
> like for del i should get dell as suggestion but its not giving any
> suggestions and also can I get suggestion for like complete the sentence
> like if I give sams it should also give samsung as in suggestion?
>
> thanks
> regards
> Rohan
>
>
>
>
> On Fri, Apr 5, 2013 at 12:54 AM, Dyer, James <James.Dyer@ingramcontent.com
> > wrote:
>
>> I assume if your user queries "delll" and it breaks it into pieces like
>> "de l l l", then you're probably using WordBreakSolrSpellChecker in
>> addition to DirectSolrSpellChecker, right?  If so, then you can specify
>> "minBreakLength" in solrconfig.xml like this:
>>
>> <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
>> ... spellcheckers here ...
>> <lst name="spellchecker">
>>       <str name="name">wordbreak</str>
>>       <str name="classname">solr.WordBreakSolrSpellChecker</str>
>>       ... parameters here ...
>>         <int name="minBreakLength">5</int>
>>     </lst>
>> </searchComponent>
>>
>> One note is that both DirectSolrSpellChecker and
>> WordBreakSolrSpellChecker operate directly on the terms dictionary and do
>> not have a separate dictionary like IndexBasedSpellChecker.  The only way
>> to prevent a word from being in the dictionary then is to filter this out
>> in the analysis chain.  For instance, if you use <copyField /> to build a
>> field just for spellchecking, you can use LengthFilterFactory to remove the
>> short terms.  See
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory.
>>
>> James Dyer
>> Ingram Content Group
>> (615) 213-4311
>>
>>
>> -----Original Message-----
>> From: Rohan Thakur [mailto:rohan.iitd@gmail.com]
>> Sent: Thursday, April 04, 2013 1:42 PM
>> To: solr-user@lucene.apache.org
>> Subject: how to avoid single character to get indexed for
>> directspellchecker dictionary
>>
>> hi all
>>
>> I am using solr directspellcheker for spell suggestions using raw analyses
>> for indexing but I have some fields which have single characters like l L
>> so its is been indexed in the dictionary and when I am using this for
>> suggestions for query like delll its suggesting de and l l l as the spell
>> correction as my index has de and l as single characters in the fields.
>> please help.
>>
>> thanks
>> regards
>> Rohan
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message