lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Bernardes <jerome.bernar...@mappy.com>
Subject Re: Highlight with NGram and German S Sharp "ß"
Date Fri, 16 Oct 2015 14:07:54 GMT
Thanks for your reply Scott.

I tried

bs.language=de&bs.country=de

Unfortunately the problem still occurs.
I have just discovered that the problem does not only affect "ß" but 
also "æ" (which is mapped to "ae"
at query and index time)
q=hae   -->   <em>hæna<em>
So it seems to me that the problem is related to any single character 
that is map to several characters using <charFilter 
class="solr.MappingCharFilterFactory" 
mapping="mapping-ISOLatin1Accent.txt"/>

Jérôme

Le 13/10/2015 07:46, Scott Stults a écrit :
> My guess is that the boundary scanner isn't configured right for your
> highlighter. Try setting the bs.language and bs.country parameters either
> in your request or in the requestHandler.
>
>
> k/r,
> Scott
>
> On Mon, Oct 5, 2015 at 4:57 AM, Jérôme Bernardes <jerome.bernardes@mappy.com
>> wrote:
>> Dear Solr Users,
>> I am facing a problem with highligting on ngram fields.
>> Highlighting is working well, except for words with german character
>> "ß".
>> Eg : with q=rosen&
>> "highlighting": {
>>          "gcl3r:12723710:6643": {
>>              "textng": [
>>                  "<em>Rosen</em>steinpark (Métro), Stuttgart (Allemagne)"
>>              ]
>>          },
>>          "gcl3r:2267495:780930": {
>>              "textng": [
>>                  "<em>Rosenstraße</em>, 94554 Moos (Allemagne)"
>>              ]
>>          }
>>      }
>> Without "ß" words are highlight partially <em>Rosen</em>steinpark but
>> with "ß", the whole word is highlighted (<em>Rosenstraße</em>)
>>
>> -------------
>> This characters ß is mapped to "ss" at query and index time (using
>> <charFilter class="solr.MappingCharFilterFactory"
>> mapping="mapping-ISOLatin1Accent.txt"/>
>>
>> )
>> .
>> Here the schema.xml for the highlighted field.
>> <fieldType name="autocomplete_ngram" class="solr.TextField">
>>    <analyzer type="index">
>>      <charFilter class="solr.MappingCharFilterFactory"
>> mapping="mapping-ISOLatin1Accent.txt"/>
>>      <!--<tokenizer class="solr.StandardTokenizerFactory"/>-->
>>                  <tokenizer class="solr.PatternTokenizerFactory"
>> pattern="[\s,;:
>> \-\']"/>
>>      <filter class="solr.WordDelimiterFilterFactory"
>>          splitOnNumerics="0"
>>          generateWordParts="1"
>>          generateNumberParts="1"
>>          catenateWords="0"
>>          catenateNumbers="0"
>>          catenateAll="0"
>>          splitOnCaseChange="1"
>>          preserveOriginal="1"
>>          types="wdfftypes.txt"
>>          />
>>      <filter class="solr.LowerCaseFilterFactory"/>
>>      <filter class="solr.SynonymFilterFactory" synonyms="synonym.txt"
>> ignoreCase="true" expand="true"/>
>>      <filter class="solr.EdgeNGramFilterFactory" maxGramSize="20"
>> minGramSize="1"/>
>>      <filter class="solr.PatternReplaceFilterFactory" pattern="([^\w\d
>> \*&æøåÆØÅ ])" replacement="" replace="all"/>
>>    </analyzer>
>>    <analyzer type="query">
>>      <charFilter class="solr.MappingCharFilterFactory"
>> mapping="mapping-ISOLatin1Accent.txt"/>
>>      <!--<tokenizer class="solr.StandardTokenizerFactory"/>-->
>>                  <tokenizer class="solr.PatternTokenizerFactory"
>> pattern="[\s,;:
>> \-\']"/>
>>      <filter class="solr.WordDelimiterFilterFactory"
>>          splitOnNumerics="0"
>>          generateWordParts="1"
>>          generateNumberParts="0"
>>          catenateWords="0"
>>          catenateNumbers="0"
>>          catenateAll="0"
>>          splitOnCaseChange="0"
>>          preserveOriginal="1"
>>          types="wdfftypes.txt"
>>          />
>>      <filter class="solr.LowerCaseFilterFactory"/>
>>      <filter class="solr.PatternReplaceFilterFactory" pattern="([^\w\d
>> \*&æøåÆØÅ ])" replacement="" replace="all"/>
>>      <filter class="solr.PatternReplaceFilterFactory"
>> pattern="^(.{20})(.*)?" replacement="$1" replace="all"/>
>>    </analyzer>
>> </fieldType>
>>
>> Is it a problem in our configuration or a known bug ?
>> Regards
>> Jérôme
>>
>>
>


Mime
View raw message