lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominique Bejean <dominique.bej...@eolya.fr>
Subject Re: Accent insensitive multi-words suggester
Date Tue, 08 Oct 2013 20:50:12 GMT
Thank you Erick.
I will try this.

Regards
Dominique

Le 06/10/13 03:03, Erick Erickson a écrit :
> Consider implementing a special field that of the form
> accentfolded|original
>
> For instance, you'd index something like
> ecole|école
> ecole|école privée
> as _terms_, not broken up at all.
>
> Now, when you send something to the suggester you send just
> "eco" or "éco" you fold them to "eco" too and get back these tokens.
> Then the app layer breaks them up and displays them pleasingly.
>
> Best
> Erick
>
> On Tue, Oct 1, 2013 at 5:45 PM, Dominique Bejean
> <dominique.bejean@eolya.fr> wrote:
>> Hi,
>>
>> Up to now, the best solution I found in order to implement a multi-words
>> suggester was to use "ShingleFilterFactory" filter at index time and the
>> termsComponent. At index time the analyzer was :
>>
>>        <analyzer type="index">
>>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>>          <filter class="solr.ASCIIFoldingFilterFactory"/>
>>          <filter class="solr.ElisionFilterFactory" ignoreCase="true"
>> articles="lang/contractions_fr.txt"/>
>>          <filter class="solr.StopFilterFactory" ignoreCase="true"
>> words="stopwords.txt" />
>>          <filter class="solr.LowerCaseFilterFactory" />
>>          <filter class="solr.ShingleFilterFactory" maxShingleSize="4"
>> outputUnigrams="true"/>
>>        </analyzer>
>>
>>
>> With "ASCIIFoldingFilter" filter, it works find if the user do not use
>> accent in query terms and all suggestions are without accents.
>> Without "ASCIIFoldingFilter" filter, it works find if the user do not forget
>> accent in query terms and all suggestions are with accents.
>>
>> Note : I use the StopFilter to avoid suggestions including stop words and
>> particularly starting or ending with stop words.
>>
>>
>> What I need is a suggester where the user can use or not use the accent in
>> query terms and the suggestions are returned with accent.
>>
>> For example, if the user type "éco" or "eco", the suggester should return :
>>
>> école
>> école primaire
>> école publique
>> école privée
>> école primaire privée
>>
>>
>> I think it is impossible to achieve this with the termComponents and I
>> should use the SpellCheckComponent instead. However, I don't see how to make
>> the suggester accent insensitive and return the suggestions with accents.
>>
>> Did somebody already achieved that ?
>>
>> Thank you.
>>
>> Dominique

-- 
Dominique Béjean
+33 6 08 46 12 43
skype: dbejean
www.eolya.fr
www.crawl-anywhere.com


Mime
View raw message