lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Benedetti <benedetti.ale...@gmail.com>
Subject Re: Phrase query not matching exact tokens in some cases
Date Tue, 14 Jul 2015 14:23:22 GMT
Which kind of Highlighter are you using ?
Anyway it is responsibility of your analysis chain.
it is an heavy analysis chain and I can see : "solr.
HunspellStemFilterFactory"

If you are using the term vector for your field, to be used by your
highlighter, in the term vector , for each document, you will find a mini
inverted indexed produced by your index time analysis chain ( i.e. office
will be there with the original offset and position in the text.

If you are not storing the term vector and you are using an highlighter
that doesn't need it, each document field will be analysed at runtime (
with the Index time analysis chain).

So your guess is correct, with such an heavy analysed field , highlighting
will work in that way.

Cheers


2015-07-14 14:58 GMT+01:00 Mike Thomsen <mikerthomsen@gmail.com>:

> For the query "police office" our users are getting back highlighted
> results for "police office*r*" (and "police office*rs*") I get why a search
> for police officers would include just "office" since the stemmer would
> cause that behavior. However I don't understand why "office" is matching
> "officer" here when no fuzzy matching is being done. Is that also a result
> of our stemmer?
>
> Here's the text field we're using:
>
> <fieldType name="text_en_splitting" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>     <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.ManagedStopFilterFactory"
>                 managed="english"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.HunspellStemFilterFactory"
>                 dictionary="en_US.dic"
>                 affix="en_US.aff"
>                 ignoreCase="false"
>                 longestOnly="false" />
>         <filter class="solr.PhoneticFilterFactory" encoder="RefinedSoundex"
> inject="true"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>     </analyzer>
>     <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>         <filter class="solr.ManagedSynonymFilterFactory" managed="english"
> />
>         <filter class="solr.ManagedStopFilterFactory"
>                 managed="english"/>
>         <filter class="solr.KeywordMarkerFilterFactory"
> protected="protwords.txt"/>
>         <filter class="solr.HunspellStemFilterFactory"
>                 dictionary="en_US.dic"
>                 affix="en_US.aff"
>                 ignoreCase="false"
>                 longestOnly="false" />
>         <filter class="solr.PhoneticFilterFactory" encoder="RefinedSoundex"
> inject="true"/>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>     </analyzer>
> </fieldType>
>
> Thanks,
>
> Mike
>



-- 
--------------------------

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message