lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dalius Sidlauskas <dalius.sidlaus...@semantico.com>
Subject Wildcard ? issue?
Date Wed, 08 Feb 2012 15:44:21 GMT
Sorry for inaccurate title.

I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full) 
containing same value:

<title xmlns="http://www.tei-c.org/ns/1.0">cal.lígraf</title>

and these fields are configured accordingly:

<fieldType name="xml"  class="solr.TextField"  positionIncrementGap="100">
       <analyzer type="index">
         <charFilter class="solr.HTMLStripCharFilterFactory"/>
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.ICUFoldingFilterFactory"/>
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.StandardTokenizerFactory"/>
         <filter class="solr.ICUFoldingFilterFactory"/>
       </analyzer>
     </fieldType>

     <fieldType name="xml_unicode"  class="solr.TextField"  positionIncrementGap="100">
       <analyzer type="index">
         <charFilter class="solr.HTMLStripCharFilterFactory"/>
         <tokenizer class="solr.StandardTokenizerFactory"/>
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       </analyzer>
     </fieldType>

     <fieldType name="xml_unicode_full"  class="solr.TextField"  positionIncrementGap="100">
       <analyzer type="index">
         <charFilter class="solr.HTMLStripCharFilterFactory"/>
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       </analyzer>
     </fieldType>

And finally my search configuration:

<requestHandler name="dictionary"  class="solr.SearchHandler">
          <lst name="defaults">
            <str name="echoParams">all</str>
            <str name="defType">edismax</str>
            <str name="mm">2&lt;-25%</str>
            <str name="qf">dc_title_unicode_full^2 dc_title_unicode^2 dc_title</str>
            <int  name="rows">10</int>
            <str name="spellcheck.onlyMorePopular">true</str>
            <str name="spellcheck.extendedResults">false</str>
            <str name="spellcheck.count">1</str>
          </lst>
         <arr name="last-components">
           <str>spellcheck</str>
         </arr>
     </requestHandler>

I am trying to match the field with various search phrases (that are 
valid). There are results:


# 	search phrase 	match? 	Comment
1 	cal.lígra? 	yes 	
2 	cal.ligra? 	no 	Changed í to i
3 	cal.ligraf 	yes 	
4 	calligra? 	no 	


The problem is the #2 attempt to match a data. The #3 works replacing ? 
with f.

One more thing. If * is used insted of ? other data is matched as 
cal.lígrafia but not cal.lígraf...

Also I have spotted some logic missmatch in debug parsedQuery field:
*
cal·lígraf:* +DisjunctionMaxQuery((dc_title:*calligraf*^2.0 | 
dc_title_unicode:cal·lígraf^3.0 | dc_title_unicode_full:cal·lígraf^3.0))
*cal·lígra?:*+DisjunctionMaxQuery((dc_title:*cal·lígra?*^2.0 | 
dc_title_unicode:cal·lígra?^3.0 | dc_title_unicode_full:cal·lígra?^3.0))

Should the second be "*calligra?*" insted?*

*Environment:
Tomcat 7.0.25 (request encoding UTF-8)
Solr 3.5.0
Java 7 Oracle
Ubuntu 11.10

-- 
Regards!
Dalius Sidlauskas


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message