lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lee Carroll <lee.a.carr...@googlemail.com>
Subject char filter factory and tokeniser issue in admin Analysis form
Date Tue, 20 Oct 2015 14:21:11 GMT
Hi,

on solr 4.7 I've ran into a strange issue. Whilst setting up a field I've
noticed in the analysis form when I use a char filter factory (for example
HTMLSCF) with a tokeniser (ST) the analysis chain grinds to a halt. the
char filter does not seem to pass anything into the tokeniser.

Field type is:

<fieldType name="clean_text" class="solr.TextField"
positionIncrementGap="100">
              <analyzer>
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.SnowballPorterFilterFactory"
language="English"/>
              </analyzer>
    </fieldType>

outpout of the analysis screen is:

Field value (index)
Content with mark up <br /> should be cleaned

HTMLSCF > Content with mark up should be cleaned
ST > <BLANK>

I know I must be missing something obvious !

Cheers Lee C
...

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message