lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject order of analyzers, tokeinizers and filters
Date Tue, 14 Sep 2010 11:37:04 GMT
it's the second time i am stumble across some strange behaviour:

in my schema.xml i have defined 

    <fieldType name="textspell" class="solr.TextField"
      <analyzer type="index">
        <!-- sg324 inkl. HTMLStrip... -->
        <charFilter class="solr.HTMLStripCharFilterFactory" />
        <filter class="solr.PatternReplaceFilterFactory" pattern="/"
replacement=" / " replace="all"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_spelling.txt" enablePositionIncrements="true" />
        <filter class="solr.LowerCaseFilterFactory"/>

i can't place the PatternReplaceFilter before the WhitespaceTokenizer. i
have the schema like above, did a reload of my core, but
when i go to analyze in the admin i can see that the WhiteSpaceTokenizer
is executed before the PatternReplaceFilter.

is there a general order of execution?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message