lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Jelsma <markus.jel...@openindex.io>
Subject RE: Trouble with mm and SynonymQuery and KeywordRepeatFilter
Date Thu, 21 Dec 2017 14:28:54 GMT
Hello Steve,

Well, that is an interesting approach to the topic indeed. But i do not think it is possible
to obtain a list of all inflected forms for all words that also have roots in some synonym
file, the stemmers are not reversible. 

Any other ideas?

Thanks,
Markus
 
-----Original message-----
> From:Steve Rowe <sarowe@gmail.com>
> Sent: Thursday 21st December 2017 0:10
> To: solr-user@lucene.apache.org
> Subject: Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter
> 
> Hi Markus,
> 
> My suggestion: rewrite your synonyms to include the triggering word in the expanded synonyms
list.  That way you won’t need KeywordRepeat/RemoveDuplicates filters, and mm=100% will
work as you expect.
> 
> I don’t think this situation is a bug, since mm applies to the built query, not to
the original query terms.
> 
> --
> Steve
> www.lucidworks.com
> 
> > On Dec 20, 2017, at 5:02 PM, Markus Jelsma <markus.jelsma@openindex.io> wrote:
> > 
> > Hello,
> > 
> > Yes of course, index time synonyms lessens the query time complexity and will solve
the mm problem. It also screws IDF and the flexibility of adding synonyms on demand. The first
we do not want, the second is impossible for us (very large main search index).
> > 
> > We are looking for a solution with mm that takes KeywordRepeat, stemming and synonym
expansion into consideration. To me the current working of mm in this case is a bug, i input
one term so treat it as one term in mm, regardless of expanded query terms.
> > 
> > Any query time ideas to share? I am not well versed with the actual code dealing
with this specific subject, the code doesn't like me. I am fine if someone points me to the
code that tells mm about the number of original input terms, and what to do. If someone does,
please also explain why the change i want to make is a bad one, what to be aware of or what
to beware of, or what to take into account.
> > 
> > Also, am i the only one who regards this behaviour as a bug, or more subtle, a weird
unexpected behaviour?
> > 
> > Many many thanks!
> > Markus
> > 
> > -----Original message-----
> >> From:Shawn Heisey <apache@elyograg.org>
> >> Sent: Wednesday 20th December 2017 22:39
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter
> >> 
> >> On 12/19/2017 4:38 AM, Markus Jelsma wrote:
> >>> I have an interesting issue with mm and SynonymQuery and KeywordRepeatFilter.
We do query time synonym expansion and use KeywordRepeat for not only finding stemmed tokens.
Our synonyms are already preprocessed and contain only stemmed tokens. Synonym file contains:
traject,verbind
> >>> 
> >>> So, any non-root stem that ends up in a synonym is actually a search for
three terms: +DisjunctionMaxQuery(((title_nl:trajecten Synonym(title_nl:traject title_nl:verbind))))
> >>> 
> >>> But, our default mm requires that two terms must match if the input query
consists of two terms: 2<-1 5<-2 6<90%
> >>> 
> >>> So, a simple query looking for a plural (trajecten) will not match a document
where the title contains only its singular form: q=trajecten will not match document with
title_nl:"een traject"
> >> 
> >> I would think that doing synonym expansion at index time would remove
> >> any possible confusion about the number of terms at query time.  Queries
> >> that involve synonyms will be slightly less complex, but the index would
> >> be larger, so it's difficult to say whether those kinds of queries would
> >> be any faster or not.
> >> 
> >> There is one clear disadvantage to index-time synonym expansion: If you
> >> change your synonyms, you have to reindex.
> >> 
> >> Thanks,
> >> Shawn
> >> 
> >> 
> 
> 

Mime
View raw message