lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Rowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-11968) Multi-words query time synonyms
Date Wed, 21 Feb 2018 03:15:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370899#comment-16370899
] 

Steve Rowe commented on SOLR-11968:
-----------------------------------

bq. LUCENE-4065 should probably be closed as won't-fix (I'll comment there in a sec).

Maybe not?  Although the {{enablePositionIncrements()}} option was removed from StopFilter
et al via LUCENE-4963, Robert Muir wrote that the idea in LUCENE-4065 may still have merit:
[https://discuss.elastic.co/t/stop-filter-problem-enablepositionincrements-false-is-not-supported-anymore-as-of-lucene-4-4-as-it-can-create-broken-token-streams/13457/5]

> Multi-words query time synonyms
> -------------------------------
>
>                 Key: SOLR-11968
>                 URL: https://issues.apache.org/jira/browse/SOLR-11968
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers, Schema and Analysis
>    Affects Versions: master (8.0), 6.6.2
>         Environment: Centos 7.x
>            Reporter: Dominique Béjean
>            Priority: Major
>
> I am trying multi words query time synonyms with Solr 6.6.2 and SynonymGraphFilterFactory
filter as explain in this article
>  [https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/]
>   
>  My field type is :
> {code:java}
> <fieldType name="textSyn" class="solr.TextField" positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
>              articles="lang/contractions_fr.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.ASCIIFoldingFilterFactory"/>
>        <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
>        <filter class="solr.FrenchMinimalStemFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
>              articles="lang/contractions_fr.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
>              ignoreCase="true" expand="true"/>
>        <filter class="solr.ASCIIFoldingFilterFactory"/>
>        <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
>        <filter class="solr.FrenchMinimalStemFilterFactory"/>
>      </analyzer>
>    </fieldType>{code}
>  
>  synonyms.txt contains the line :
> {code:java}
> om, olympique de marseille{code}
>  
>  stopwords.txt contains the word 
> {code:java}
> de{code}
>  
>  The order of words in my query has an impact on the generated query in edismax
> {code:java}
> q={!edismax qf='name_text_gp' v=$qq}
>  &sow=false
>  &qq=...{code}
> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the synonyms expansion.
It is working as expected.
> {code:java}
> "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil +name_text_gp:maillot)
name_text_gp:om))",
>  "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu +name_text_gp:marseil
+name_text_gp:maillot)))",{code}
> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the same generated
query 
> {code:java}
> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>  "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",{code}
> I don't understand these generated queries. The first one looks like the synonym expansion
is ignored, but the second one shows it is not ignored and only the synonym term is used.
>   
>  When I test the analisys for the field type the synonyms are correctly expanded for
both expressions
> {code:java}
> om maillot  
>  maillot om
>  olympique de marseille maillot
>  maillot olympique de marseille{code}
> resulting outputs always include the following terms (obvioulsly not always in the same
order)
> {code:java}
> olympiqu om marseil maillot {code}
>  
>  So, i suspect an issue with edismax query parser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message