lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoffrey Young <ge...@modperlcookbook.org>
Subject camel-casing and dismax troubles
Date Tue, 12 May 2009 23:19:04 GMT
hi all :)

I'm having trouble with camel-cased query strings and the dismax handler.

a user query

 LeAnn Rimes

isn't matching the indexed term

 Leann Rimes

even though both are lower-cased in the end.  furthermore, the
analysis tool shows a match.

the debug query looks like

 "parsedquery":"+((DisjunctionMaxQuery((search-en:\"(leann le)
ann\")) DisjunctionMaxQuery((search-en:rimes)))~2) ()",

I have a feeling it's due to how the broken up tokens are added back
into the token stream with PreserveOriginal, and some strange
interaction between that order and dismax, but I'm not entirely sure.

configs follow.  thoughts appreciated.

--Geoff

  <fieldType name="search-en" class="solr.TextField"
positionIncrementGap="100">
    <analyzer type="index">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.ISOLatin1AccentFilterFactory" />
      <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1"
                                                      generateWordParts="1"
                                                      generateNumberParts="1"
                                                      catenateWords="1"
                                                      catenateNumbers="1"
                                                      catenateAll="1"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
      <filter class="solr.StopFilterFactory" ignoreCase="false"
words="stopwords-en.txt"/>
    </analyzer>

    <analyzer type="query">
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.ISOLatin1AccentFilterFactory" />
      <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1"
                                                      generateWordParts="1"
                                                      generateNumberParts="1"
                                                      catenateWords="0"
                                                      catenateNumbers="0"
                                                      catenateAll="0"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.StopFilterFactory" ignoreCase="false"
words="stopwords-en.txt"/>
    </analyzer>
  </fieldType>

Mime
View raw message