lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Rowe <sar...@gmail.com>
Subject Re: How to use the StandardTokenizer with currency
Date Tue, 06 Dec 2016 21:26:47 GMT
Cool, thanks for letting us know (and sorry about the typo!)

--
Steve
www.lucidworks.com

> On Dec 6, 2016, at 4:15 PM, Vinay B, <vybe3142@gmail.com> wrote:
> 
> Yes, that works (apart from the typo in PatternReplaceCharFilterFactory)
> 
> Here is my config
> 
> <!-- VB - Just like text_general, but supports $ currency matching and
> autoGeneratePhraseQueries -->
> <fieldType name="text_curr_3" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>  <analyzer type="index">
>    <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping.txt"/>
>    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\$"
> replacement="xxdollarxx"/>
>    <tokenizer class="solr.StandardTokenizerFactory"/>
>    <filter class="solr.PatternReplaceFilterFactory" pattern="xxdollarxx"
> replacement="\$" replace="all"/>
>    <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1" types="word-delim-types.txt" />
>    <filter class="solr.LowerCaseFilterFactory"/>
> </analyzer>
>  <analyzer type="query">
>    <charFilter class="solr.MappingCharFilterFactory"
> mapping="mapping.txt"/>
>    <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\$"
> replacement="xxdollarxx"/>
>    <tokenizer class="solr.StandardTokenizerFactory"/>
>    <filter class="solr.PatternReplaceFilterFactory" pattern="xxdollarxx"
> replacement="\$" replace="all"/>
>    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>    <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>    <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
> generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"  types="word-delim-types.txt" />
>    <filter class="solr.LowerCaseFilterFactory"/>
>  </analyzer>
> </fieldType>
> 
> On Wed, Nov 30, 2016 at 2:08 PM, Steve Rowe <sarowe@gmail.com> wrote:
> 
>> Hi Vinay,
>> 
>> You should be able to use a char filter to convert “$” characters into
>> something that will survive tokenization, and then a token filter to
>> convert it back.
>> 
>> Something like this (untested):
>> 
>>  <analyzer>
>>    <charFilter class=“solr.PatternReplaceCharFiterFactory”
>>                pattern=“\$”
>>                replacement=“__dollar__”/>
>>    <tokenizer class=“solr.StandardTokenizerFactory”/>
>>    <filter class="solr.PatternReplaceFilterFactory”
>>            pattern=“__dollar__”
>>            replacement=“\$”
>>            replace=“all”/>
>>  </analyzer>
>> 
>> --
>> Steve
>> www.lucidworks.com
>> 
>>> On Nov 30, 2016, at 1:58 PM, Vinay B, <vybe3142@gmail.com> wrote:
>>> 
>>> Prior discussion at
>>> http://stackoverflow.com/questions/40877567/using-
>> standardtokenizerfactory-with-currency
>>> 
>>> I'd like to maintain other aspects of the StandardTokenizer functionality
>>> but I'm wondering if to do what I want, the task boils down to be able to
>>> instruct the StandardTokenizer not to discard the $ symbol ? Or is there
>>> another way? I'm hoping that this is possible with configuration, rather
>>> than code changes.
>>> 
>>> Thanks
>> 
>> 


Mime
View raw message