lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: SynonymFilterFactory breaking WordDelimiterFilterFactory output
Date Fri, 23 Nov 2012 13:50:24 GMT
Best advice here is to look hard at admin/analysis and see.

But a couple of notes:
1> it's usually unnecessary to include the exact same synonyms in both
query and index time chains. Index-time is preferred.

2> putting lowercasefilter in front of worddelimiterfilter is going to
break wdff _if_ you intend camel-case to produce multiple tokens.

3> brackets? did you mean parentheses? If so I suspect your issue is in
your request handler not your analysis chain. Perhaps something with
autogeneratephrase?

Providing the &debugQuery=true output would help.

Best
Erick


On Tue, Nov 20, 2012 at 10:55 PM, Chris Book <chrisbook@gmail.com> wrote:

> Hello, I've recently upgraded from Solr 1.4.1 to 3.6.1 and an running into
> a problem with a specific query.  When I search for "8mile" or 8-mile"
> without the quotes, and I use just the WordDelimiterFilterFactory as
> configured below, I get this query which is as expected: album:"(8mile 8)
> mile"
>
> But when I also add in the SynonymFilterFactory config listed below, I get
> this query instead: album:"8mile eight mile".  In my test the only contents
> of synonyms.txt is 8=>eight.  The issue with the 2nd query is the brackets
> are removed so it now seems to require all 3 terms as a phrase.
>
> So why does WordDelimitorFilterFactory generate the query I want with both
> original and split phrases, but when the number 8 is replaced with eight,
> that data is lost and I end up with a phrase that will cause no results to
> be found?
>
> This was part of a test case I have that I believe this used to work on
> 1.4.1 but I still have to confirm.
>
>
>     <fieldType name="text_title" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="true">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.WordDelimiterFilterFactory"
>                 generateWordParts="1"
>                 generateNumberParts="1"
>                 catenateWords="1"
>                 catenateNumbers="1"
>                 catenateAll="0"
>                 preserveOriginal="1"
>                 splitOnCaseChange="1"
>                 protected="protwords.txt"
>                 />
>         <filter class="solr.SynonymFilterFactory"
>                 synonyms="synonyms.txt"
>                 ignoreCase="true"
>                 expand="true"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.ASCIIFoldingFilterFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.WordDelimiterFilterFactory"
>                 generateWordParts="1"
>                 generateNumberParts="1"
>                 catenateWords="0"
>                 catenateNumbers="0"
>                 catenateAll="0"
>                 preserveOriginal="1"
>                 splitOnCaseChange="1"
>                 protected="protwords.txt"
>                 />
>         <filter class="solr.SynonymFilterFactory"
>                 synonyms="synonyms.txt"
>                 ignoreCase="true"
>                 expand="true"/>
>       </analyzer>
>     </fieldType>
>
> Note that I have updated by schema version to 1.5 and my luceneMatchVersion
> to LUCENE_36.
>
> Thanks,
> Chris
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message