lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Thomsen <mikerthom...@gmail.com>
Subject Phrase query not matching exact tokens in some cases
Date Tue, 14 Jul 2015 13:58:09 GMT
For the query "police office" our users are getting back highlighted
results for "police office*r*" (and "police office*rs*") I get why a search
for police officers would include just "office" since the stemmer would
cause that behavior. However I don't understand why "office" is matching
"officer" here when no fuzzy matching is being done. Is that also a result
of our stemmer?

Here's the text field we're using:

<fieldType name="text_en_splitting" class="solr.TextField"
positionIncrementGap="100" autoGeneratePhraseQueries="true">
    <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.ManagedStopFilterFactory"
                managed="english"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.HunspellStemFilterFactory"
                dictionary="en_US.dic"
                affix="en_US.aff"
                ignoreCase="false"
                longestOnly="false" />
        <filter class="solr.PhoneticFilterFactory" encoder="RefinedSoundex"
inject="true"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.ManagedSynonymFilterFactory" managed="english"
/>
        <filter class="solr.ManagedStopFilterFactory"
                managed="english"/>
        <filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt"/>
        <filter class="solr.HunspellStemFilterFactory"
                dictionary="en_US.dic"
                affix="en_US.aff"
                ignoreCase="false"
                longestOnly="false" />
        <filter class="solr.PhoneticFilterFactory" encoder="RefinedSoundex"
inject="true"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
</fieldType>

Thanks,

Mike

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message