lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Strip out punctuation at the end of token
Date Thu, 23 Nov 2017 15:21:29 GMT
On 11/23/2017 8:06 AM, marotosg wrote:
> I am trying to strip out any "."  at the end of a token but I would like to
> keep the original token as well.
> This is my index analyzer
> <analyzer type="index">
>           <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>            <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"
> preserveOriginal="1"/>
>            <filter class="solr.ASCIIFoldingFilterFactory"
> preserveOriginal="false"/>
>            <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
> 
> i was thinking of using the solr.PatternReplaceFilterFactory but i see this
> one won't keep the original token.

The WordDelimiterFilterFactory that you have configured will do that.

Here I have taken your analysis chain, added it to a test install of 
Solr, and tried it out.  It appears to be doing exactly what you want it 
to do.

https://www.dropbox.com/s/5puf7rzbypdcspu/wdf-analysis-marotosg.png?dl=0

Thanks,
Shawn

Mime
View raw message