lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <arafa...@gmail.com>
Subject Re: PatternReplaceFilterFactory problem
Date Mon, 28 Jan 2019 18:36:49 GMT
In Admin UI, there is an Analysis screen. You can enter your text and
your query there and see what happens to it at every step of the
processing pipeline.

This should tell you whether the problem is in indexing, query, or
somewhere else entirely (e.g. you are querying a different field as
Scott suggests).

Regards,
   Alex.
P.s. (Semi-)random tip of the day. If you copyField the content, it is
indexed and searched by the rules of the _target_ field. Only when you
search on the field directly, its chain is invoked.

On Mon, 28 Jan 2019 at 06:02, Chris Wareham
<chris.wareham@graduate-jobs.com> wrote:
>
> I'm trying to index some data which often includes domain names. I'd
> like to remove the .com TLD, so I have modified the text_en field type
> by adding a PatternReplaceFilterFactory filter. However, it doesn't
> appear to be working as a search for "text:(mydomain.com)" matches
> records but "text:(mydomain)" does not.
>
>    <fieldType name="text_en" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
>        <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.PatternReplaceFilterFactory"
> pattern="([-a-z])\.com" replacement="$1"/>
>        <filter class="solr.EnglishPossessiveFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" expand="true"
> ignoreCase="true" synonyms="synonyms.txt"/>
>        <filter class="solr.StopFilterFactory" words="stopwords.txt"
> ignoreCase="true"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.PatternReplaceFilterFactory"
> pattern="([-a-z])\.com" replacement="$1"/>
>        <filter class="solr.EnglishPossessiveFilterFactory"/>
>        <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>        <filter class="solr.PorterStemFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
> The actual field definitions are as follows:
>
>    <field name="companyName"      type="text_en"      indexed="true"
> stored="true"  required="true"             />
>    <field name="jobTitle"         type="text_en"      indexed="true"
> stored="true"  required="true"             />
>    <field name="text"             type="text_general" indexed="true"
> stored="false"                             />
>
>    <copyField source="companyName" dest="text" />
>    <copyField source="jobTitle"    dest="text" />

Mime
View raw message