lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautović <emir.arnauto...@sematext.com>
Subject Re: Advice on Stemming in Solr
Date Thu, 02 Nov 2017 09:46:35 GMT
Hi Edwin,
It seems that it would be best if you do not apply *ing stemming rule at all. The first idea
is to trick stemmer and replace any word that ends with ing to some nonexisting char combination
e.g. ‘wqx’. You can use solr.PatternReplaceFilterFactory to do that. You can switch it
back after stemming if want to have proper token in index.

HTH,
Emir 
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 2 Nov 2017, at 03:23, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com> wrote:
> 
> Hi Emir,
> 
> We do have quite alot of words that should not be stemmed. Currently, the
> KStemFilterFactory are stemming all the non-English words that end with
> "ing" as well. There are quite alot of places and names which ends in
> "ing", and all these are being stemmed as well, which leads to an
> inaccurate search.
> 
> Regards,
> Edwin
> 
> 
> On 1 November 2017 at 18:20, Emir Arnautović <emir.arnautovic@sematext.com>
> wrote:
> 
>> Hi Edwin,
>> If the number of words that should not be stemmed is not high you could
>> use KeywordMarkerFilterFactory to flag those words as keywords and it
>> should prevent stemmer from changing them.
>> Depending on what you want to achieve, you might not be able to avoid
>> using stemmer at indexing time. If you want to find documents that contain
>> only “walking” with search term “walk”, then you have to stem at index
>> time. Cases when you use stemming on query time only are rare and specific.
>> If you want to prefer exact matches over stemmed matches, you have to
>> index same content with and without stemming and boost matches on field
>> without stemming.
>> 
>> HTH,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>> 
>> 
>> 
>>> On 1 Nov 2017, at 10:11, Zheng Lin Edwin Yeo <edwinyeozl@gmail.com>
>> wrote:
>>> 
>>> Hi,
>>> 
>>> We are currently using KStemFilterFactory in Solr, but we found that it
>> is
>>> actually doing stemming on non-English words like "ximenting", which it
>>> stem to "ximent". This is not what we wanted.
>>> 
>>> Another option is to use the HunspellStemFilterFactory, but there are
>> some
>>> English words like "running", walking" that are not being stemmed.
>>> 
>>> Would like to check, is it advisable to use Stemming at index? Or we
>> should
>>> not use Stemming at index time, but at query time, do a search for the
>>> stemmed words as well, like for example, if the user search for
>> "walking",
>>> we will do the search together with "walk", and the actual word of
>> walking
>>> will have higher weightage.
>>> 
>>> I'm currently using Solr 6.5.1.
>>> 
>>> Regards,
>>> Edwin
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message