lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Rowe (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-11968) Multi-words query time synonyms
Date Thu, 22 Feb 2018 17:32:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-11968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16373123#comment-16373123
] 

Steve Rowe edited comment on SOLR-11968 at 2/22/18 5:31 PM:
------------------------------------------------------------

bq. The posLen of olimpique is 1 but marseille has a posInc of 2. This means that there is
a hole between olimpique and marseille but posLen doesn't indicate this hole.

I think you're wrong, [~jim.ferenczi].

posLen (on olimpique) doesn't have to indicate this hole, because it doesn't have anything
to do with the gap.

bq. olimpique points to a state that doesn't exist

Aha, this is the crux, I assume: the "state that doesn't exist" isn't actually represented
by these two attributes, it has to be inferred.  IMHO the brokenness here is inability to
handle gaps, not in token filters that produce them.

bq. I think it's simpler to make sure that stopfilter doesn't break a graph like Robert suggested.

AFAICT Robert is suggesting a StopFilter *mode* that would *optionally* remove gaps.  IOW
its current behavior would remain (and be the default).


was (Author: steve_rowe):
bq. The posLen of olimpique is 1 but marseille has a posInc of 2. This means that there is
a hole between olimpique and marseille but posLen doesn't indicate this hole.

I think you're wrong, [~jim.ferenczi].

bq. olimpique points to a state that doesn't exist

Aha, this is the crux, I assume: the "state that doesn't exist" isn't actually represented
by these two attributes, it has to be inferred.  IMHO the brokenness here is inability to
handle gaps, not in token filters that produce them.

posLen (on olimpique) doesn't have to indicate this hole, because it doesn't have anything
to do with the gap.



bq. I think it's simpler to make sure that stopfilter doesn't break a graph like Robert suggested.

AFAICT Robert is suggesting a StopFilter *mode* that would *optionally* remove gaps.  IOW
its current behavior would remain (and be the default).

> Multi-words query time synonyms
> -------------------------------
>
>                 Key: SOLR-11968
>                 URL: https://issues.apache.org/jira/browse/SOLR-11968
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers, Schema and Analysis
>    Affects Versions: master (8.0), 6.6.2
>         Environment: Centos 7.x
>            Reporter: Dominique Béjean
>            Assignee: Steve Rowe
>            Priority: Major
>
> I am trying multi words query time synonyms with Solr 6.6.2 and SynonymGraphFilterFactory
filter as explain in this article
>  [https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/]
>   
>  My field type is :
> {code:java}
> <fieldType name="textSyn" class="solr.TextField" positionIncrementGap="100">
>      <analyzer type="index">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
>              articles="lang/contractions_fr.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.ASCIIFoldingFilterFactory"/>
>        <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
>        <filter class="solr.FrenchMinimalStemFilterFactory"/>
>      </analyzer>
>      <analyzer type="query">
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.ElisionFilterFactory" ignoreCase="true" 
>              articles="lang/contractions_fr.txt"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt"
>              ignoreCase="true" expand="true"/>
>        <filter class="solr.ASCIIFoldingFilterFactory"/>
>        <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
>        <filter class="solr.FrenchMinimalStemFilterFactory"/>
>      </analyzer>
>    </fieldType>{code}
>  
>  synonyms.txt contains the line :
> {code:java}
> om, olympique de marseille{code}
>  
>  stopwords.txt contains the word 
> {code:java}
> de{code}
>  
>  The order of words in my query has an impact on the generated query in edismax
> {code:java}
> q={!edismax qf='name_text_gp' v=$qq}
>  &sow=false
>  &qq=...{code}
> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see the synonyms expansion.
It is working as expected.
> {code:java}
> "parsedquery_toString":"+(((+name_text_gp:olympiqu +name_text_gp:marseil +name_text_gp:maillot)
name_text_gp:om))",
>  "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu +name_text_gp:marseil
+name_text_gp:maillot)))",{code}
> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see the same generated
query 
> {code:java}
> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>  "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",{code}
> I don't understand these generated queries. The first one looks like the synonym expansion
is ignored, but the second one shows it is not ignored and only the synonym term is used.
>   
>  When I test the analisys for the field type the synonyms are correctly expanded for
both expressions
> {code:java}
> om maillot  
>  maillot om
>  olympique de marseille maillot
>  maillot olympique de marseille{code}
> resulting outputs always include the following terms (obvioulsly not always in the same
order)
> {code:java}
> olympiqu om marseil maillot {code}
>  
>  So, i suspect an issue with edismax query parser.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message