lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emir Arnautović <emir.arnauto...@sematext.com>
Subject Re: Strip out punctuation at the end of token
Date Thu, 23 Nov 2017 15:15:25 GMT
Hi Sergio,
You can use PatternCaptureGroupFilterFactory to emit both tokens. This token filter is not
documented in recent documentation but it is still there.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 23 Nov 2017, at 16:06, marotosg <marotosg@gmail.com> wrote:
> 
> Hi all,
> 
> I am trying to strip out any "."  at the end of a token but I would like to
> keep the original token as well.
> This is my index analyzer
> <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>
>          <filter class="solr.ASCIIFoldingFilterFactory"
> preserveOriginal="false"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> 
> i was thinking of using the solr.PatternReplaceFilterFactory but i see this
> one won't keep the original token.
> 
> Any help?
> 
> Thanks a lot
> Sergio Maroto
> 
> 
> 
> 
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Mime
View raw message