lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Muhammad Zahid Iqbal <zahid.iq...@northbaysolutions.net>
Subject Re: Indexing word with plus sign
Date Mon, 22 May 2017 12:26:05 GMT
Hi,


Before applying tokenizer, you can replace your special symbols with some
phrase to preserve it and after tokenized you can replace it back.

For example:
<charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(\+)"
replacement="xxx" />


Thanks,
Zahid iqbal

On Mon, May 22, 2017 at 12:57 AM, Fundera Developer <
funderadeveloper@outlook.com> wrote:

> Hi all,
>
> I am a bit stuck at a problem that I feel must be easy to solve. In
> Spanish it is usual to find the term 'i+d'. We are working with Solr 5.5,
> and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in the
> index documents both in Spanish and Catalan, and in Catalan it is frequent
> to find 'i' as a word, when a user searches for 'i+d' it gets Catalan
> documents as results.
>
> I have tried to use the SynonymFilter, with something like:
>
> i+d => investigacionYdesarrollo
>
> But it does not seem to change anything.
>
> Is there a way I could set an exception to the Tokenizer so that it does
> not split this word?
>
> Thanks in advance!
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message