lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Stults <sstu...@opensourceconnections.com>
Subject Re: PatternReplaceFilterFactory problem
Date Mon, 28 Jan 2019 12:37:08 GMT
Hi Chris,

You've included the field definition of type text_en, but in your queries
you're searching the field "text", which is of type text_general. That may
be the source of your problem, but if looking into that doesn't help send
the definition of text_general as well.

Hope that helps!

-Scott

On Mon, Jan 28, 2019 at 6:02 AM Chris Wareham <
chris.wareham@graduate-jobs.com> wrote:

> I'm trying to index some data which often includes domain names. I'd
> like to remove the .com TLD, so I have modified the text_en field type
> by adding a PatternReplaceFilterFactory filter. However, it doesn't
> appear to be working as a search for "text:(mydomain.com)" matches
> records but "text:(mydomain)" does not.
>
>    <fieldType name="text_en" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
>        <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.PatternReplaceFilterFactory"
> pattern="([-a-z])\.com" replacement="$1"/>
>        <filter class="solr.EnglishPossessiveFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
>        <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.PatternReplaceFilterFactory"
> pattern="([-a-z])\.com" replacement="$1"/>
>        <filter class="solr.EnglishPossessiveFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
> The actual field definitions are as follows:
>
>    <field name="companyName"      type="text_en"      indexed="true"
> stored="true"  required="true"             />
>    <field name="jobTitle"         type="text_en"      indexed="true"
> stored="true"  required="true"             />
>    <field name="text"             type="text_general" indexed="true"
> stored="false"                             />
>
>    <copyField source="companyName" dest="text" />
>    <copyField source="jobTitle"    dest="text" />
>


-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message