lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Unexpected query rewrite from WordDelimiterFilterFactory and SynonymFilterFactory
Date Tue, 15 May 2012 02:22:12 GMT
If it is important enough for you, you could expand multi-word and compound 
word synonyms as a preprocessing step and generate an "OR" expression in the 
query.

-- Jack Krupansky

-----Original Message----- 
From: Chung Wu
Sent: Monday, May 14, 2012 8:25 PM
To: solr-user@lucene.apache.org
Subject: Re: Unexpected query rewrite from WordDelimiterFilterFactory and 
SynonymFilterFactory

Thanks Jack!  It's too bad I can't have catenate and generateParts both set
to "1" at query time.  If I set catenate to "0", then I miss the case where
"wifi" is indexed but "wi-fi" is queried.  If I set generateParts to "0",
then I miss the case where "wi fi" is queried but "wi-fi" is canceled.   I
guess I'll just have to pick one!

Chung

On Mon, May 14, 2012 at 4:50 PM, Jack Krupansky 
<jack@basetechnology.com>wrote:

> The extra terms are okay at index time - they simply overlap the base
> words and make composite terms more searchable, but you need to have a
> separate query analyzer that sets the various catenate options to "0" 
> since
> the query generator doesn't know what to do with the extra terms. Synonyms
> are a little more tricky - the simplest thing is to disable them in the
> index analyzer and do them only in the query analyzer - and multi-term
> synonyms don't work well, except for replacement synonyms at index time.
>
> See the "text_en_splitting" field type in the example schema.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Chung Wu
> Sent: Monday, May 14, 2012 7:01 PM
> To: solr-user@lucene.apache.org
> Subject: Unexpected query rewrite from WordDelimiterFilterFactory and
> SynonymFilterFactory
>
>
> Hi all!
>
> I'm using Solr 3.6, and I'm seeing unexpected query rewriting when either
> using WordDelimiterFilterFactory with catenateWords="1", or with
> SynonymFilterFactory with multi-word synonyms.
>
> For example, in this type where a WordDelimiterFilterFactory is used for
> the query analyzer, with catenateWords="1":
>
>   <fieldType name="testType" class="solr.TextField"
> positionIncrementGap="100" autoGeneratePhraseQueries="**true">
>     <analyzer>
>       <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
>       <filter class="solr.**WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
>     </analyzer>
>   </fieldType>
>
> For the query "wi-fi", the term positions after the
> WordDelimiterFilterFactory looks like this:
>
> position 1 2 term text wi fi wifi startOffset 0 3 0 endOffset 2 5 5
> typewordwordword
>
>
> And looking at debug output, the parsed query looks like this, which is
> surprising:
>
> <str name="rawquerystring">test1:"**wi-fi"</str>
> <str name="querystring">test1:"wi-**fi"</str>
> <str name="parsedquery">**MultiPhraseQuery(test1:"wi (fi wifi)")</str>
> <str name="parsedquery_toString">***test1:"wi (fi wifi)*"</str>
>
>
> I see similar things happening if I use SynonymFilterFactory with
> multi-word synonyms (maybe related to this bug:
> https://issues.apache.org/**jira/browse/SOLR-3390<https://issues.apache.org/jira/browse/SOLR-3390>;
> I originally asked about
> it here:
> http://stackoverflow.com/**questions/10218224/in-solr-**
> expanding-multi-word-synonyms-**and-term-positions<http://stackoverflow.com/questions/10218224/in-solr-expanding-multi-word-synonyms-and-term-positions>
> )
>
> Any ideas on what I'm supposed to do to make this work as expected?
>
> Thanks!
>
> Chung
> 


Mime
View raw message