lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Accent insensitive multi-words suggester
Date Sun, 06 Oct 2013 01:03:39 GMT
Consider implementing a special field that of the form
accentfolded|original

For instance, you'd index something like
ecole|école
ecole|école privée
as _terms_, not broken up at all.

Now, when you send something to the suggester you send just
"eco" or "éco" you fold them to "eco" too and get back these tokens.
Then the app layer breaks them up and displays them pleasingly.

Best
Erick

On Tue, Oct 1, 2013 at 5:45 PM, Dominique Bejean
<dominique.bejean@eolya.fr> wrote:
> Hi,
>
> Up to now, the best solution I found in order to implement a multi-words
> suggester was to use "ShingleFilterFactory" filter at index time and the
> termsComponent. At index time the analyzer was :
>
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>         <filter class="solr.ElisionFilterFactory" ignoreCase="true"
> articles="lang/contractions_fr.txt"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.LowerCaseFilterFactory" />
>         <filter class="solr.ShingleFilterFactory" maxShingleSize="4"
> outputUnigrams="true"/>
>       </analyzer>
>
>
> With "ASCIIFoldingFilter" filter, it works find if the user do not use
> accent in query terms and all suggestions are without accents.
> Without "ASCIIFoldingFilter" filter, it works find if the user do not forget
> accent in query terms and all suggestions are with accents.
>
> Note : I use the StopFilter to avoid suggestions including stop words and
> particularly starting or ending with stop words.
>
>
> What I need is a suggester where the user can use or not use the accent in
> query terms and the suggestions are returned with accent.
>
> For example, if the user type "éco" or "eco", the suggester should return :
>
> école
> école primaire
> école publique
> école privée
> école primaire privée
>
>
> I think it is impossible to achieve this with the termComponents and I
> should use the SpellCheckComponent instead. However, I don't see how to make
> the suggester accent insensitive and return the suggestions with accents.
>
> Did somebody already achieved that ?
>
> Thank you.
>
> Dominique

Mime
View raw message