lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Preeti Bhat <preeti.b...@shoregrp.com>
Subject Using asterik(*) with unicode characters.
Date Wed, 28 Jun 2017 13:25:32 GMT
Hi All,

I have a requirement where the user can give an Unicode or ascii character as input but expects
same result.

For example: MöllerGruppen AS vs MollerGruppen AS should give out same result.

I am able to get this done using <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>,
but due to some reason when it try to do MöllerGruppen* I am getting the below message.

""metadata":[
      "error-class","org.apache.solr.common.SolrException",
      "root-error-class","org.apache.solr.common.SolrException"],
    "msg":"analyzer returned too many terms for multiTerm term: MöllerGruppen",
    "code":400}}
"

It works for MollerGruppen* though.

Could someone please advise on this.

Below is the fieldtype of this field.

<fieldType name="string_ci" class="solr.TextField">
    <analyzer type="index">
            <charFilter class="solr.HTMLStripCharFilterFactory"/>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
              <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
              <filter class="solr.TrimFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
              <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" splitOnCaseChange="0"
catenateWords="1" splitOnNumerics="0" stemEnglishPossessive="0" preserveOriginal="1"/>
    </analyzer>
    <analyzer type="query">
            <charFilter class="solr.HTMLStripCharFilterFactory"/>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
              <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>
              <filter class="solr.TrimFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
              <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" splitOnCaseChange="0"
catenateWords="1" splitOnNumerics="0" stemEnglishPossessive="0" preserveOriginal="1"/>
    </analyzer>
  </fieldType>



Thanks and Regards,
Preeti



NOTICE TO RECIPIENTS: This communication may contain confidential and/or privileged information.
If you are not the intended recipient (or have received this communication in error) please
notify the sender and it-support@shoregrp.com immediately, and destroy this communication.
Any unauthorized copying, disclosure or distribution of the material in this communication
is strictly forbidden. Any views or opinions presented in this email are solely those of the
author and do not necessarily represent those of the company. Finally, the recipient should
check this email and any attachments for the presence of viruses. The company accepts no liability
for any damage caused by any virus transmitted by this email.



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message