lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Ribeiro <edward.ribe...@gmail.com>
Subject Re: Edismax ignoring queries containing booleans
Date Fri, 10 Jan 2020 05:16:01 GMT
The fq is not affected by mm parameter because it uses Solr's default query
parser (LuceneQueryParser) that doesn't support it. But you can change the
parser used by fq this way: fq={!edismax}recordID:(10 20) or fq={!edismax
mm=1}recordID:(10 20) , for example (even though that is not the case here).

Please, let me know if any of the suggestions, or any other you come up
with, solve the issue and don't forget to test those approaches so that you
can avoid any performance degradation.

Best,
Edward

On Fri, Jan 10, 2020 at 1:41 AM Edward Ribeiro <edward.ribeiro@gmail.com>
wrote:

> Hi Claire,
>
> > The only visual difference I think is the ~2 which came after the
> initial part of the parsed query:
> > Old Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> (recordID:[20 TO 20]))~2
> > New Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> (recordID:[20 TO 20]))
>
> The mm (minimum match) parameter alter the behaviour of the OR clauses.
> See here:
> https://lucene.apache.org/solr/guide/8_3/the-dismax-query-parser.html#mm-minimum-should-match-parameter
> For example, if there is a query like `text:(toys OR children OR sales)`,
> but your mm=3, then at least three terms are required to match. The query
> is now equivalent to `text:(toys AND children AND sales)`
>
> In the "+((recordID:[18 TO 18]) (recordID:[19 TO 19]) (recordID:[20 TO
> 20]))~2" query the "))~2" part means that at least two matches are required
> of the three optional terms: 18, 19, and 20. But recordID will only match
> at most one term. Therefore, it will return no documents because it will
> never satisfy the condition setup by mm (match 18 AND 19 AND 20). If mm=1
> the query would work as intended in this example.
>
> The mm parameter you use is: 0<1 2<-1 5<-2 6<90% that can roughly be
> translated as:
>
> * 0<1 : If there is one term then minimum match 1??? Didn't get this one.
>
> * 2<-1 5<-2 6<90% : If there are one or two terms then mininum match all.
> Between 3 and 5 (inclusive) terms match all but one (in your example there
> are 3 numbers so it will require to match at least 2, that’s the reason of
> the ~2). If there are 6 terms then match 4 (6 - 2), and above 6 terms then
> matches 90% of the terms (e.g., if there are 10 clauses then it is required
> to match at least 9).
>
> > There shouldn't be a problem using mm with edismax right? Or does the
> problem lie with the structure of my qf/pf and then adding mm?
>
> Nope. There’s no problem using mm with edismax nor the problem lies on
> qf/pf. As you dig
>
> > I can see this is a change to default behaviour, but does it mean I
> should be passing mm in the query now rather than just at config level?
>
> I see a couple of approaches to solve this issue:
>
> 1) Removing the mm parameter from solrconfig. But it probably was setup
> for a reason so you should check before hand. In this case, you could issue
> mm=0<1 2<-1 5<-2 6<90% as a query parameter if necessary.
>
> 2) Adding a mm=1 as a query parameter whenever you search for recordID.
> Issuing the parameter in the query will overwrite the mm parameter that was
> setup in solrconfig for that particular query.
>
> 3) Doing a match all query (q=*:*) and moving the recordID query to a
> filter query: fq=recordID:(18 OR 19 OR 20)  The fq is not affected by mm
> parameter or so it seems. No need to change mm in solrconfig nor adding mm
> as a query parameter.
>
> Particularly, I would go with either 2) or 3).
>
> Best,
> Edward
>
> On Thu, Jan 9, 2020 at 7:47 AM Claire Pollard <claire.pollard@imagen.io>
> wrote:
> >
> > Also, I've found this bug from previous which highlights the issue with
> ))~2
> >
> > https://issues.apache.org/jira/browse/SOLR-8812
> >
> > mm is set at config, but not explicitly in the query...
> >
> > I can see this is a change to default behaviour, but does it mean I
> should be passing mm in the query now rather than just at config level?
> >
> > -----Original Message-----
> > From: Claire Pollard <claire.pollard@imagen.io>
> > Sent: 09 January 2020 10:23
> > To: solr-user@lucene.apache.org
> > Subject: RE: Edismax ignoring queries containing booleans
> >
> > Hey Edward,
> >
> > Thanks for the tips.
> >
> > I've cleaned up my solrconfig, removed the duplicate df, tabs and
> newlines, and tried commenting out the bits you've suggested and adding
> them back in bit by bit, and it seems mm was the thing which is breaking
> the query for me.
> >
> > Without it, the query returns 2 documents as expected.
> >
> > "debug":{
> >     "rawquerystring":"recordID:(18 OR 19 OR 20)",
> >     "querystring":"recordID:(18 OR 19 OR 20)",
> >     "parsedquery":"+((recordID:[18 TO 18]) (recordID:[19 TO 19])
> (recordID:[20 TO 20])) DisjunctionMaxQuery(((text:\"19 20\"~100)^0.2 |
> (annotations:\"19 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 |
> collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | (Test_AR:\"19
> 20\"~100)^1.1))",
> >     "parsedquery_toString":"+(recordID:[18 TO 18] recordID:[19 TO 19]
> recordID:[20 TO 20]) ((text:\"19 20\"~100)^0.2 | (annotations:\"19
> 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 |
> collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | (Test_AR:\"19
> 20\"~100)^1.1)",
> >     "explain":{
> >       "2CBF8A49-CA2D-4e42-88F2-3790922EF415":"\n1.0 = sum of:\n  1.0 =
> sum of:\n    1.0 = recordID:[19 TO 19]\n",
> >       "F73CFBC7-2CD2-4aab-B8C1-9D19D427EAFB":"\n1.0 = sum of:\n  1.0 =
> sum of:\n    1.0 = recordID:[20 TO 20]\n"},
> >
> > The only visual difference I think is the ~2 which came after the
> initial part of the parsed query:
> >
> > Old Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
> (recordID:[20 TO 20]))~2 New Query start: +((recordID:[18 TO 18])
> (recordID:[19 TO 19]) (recordID:[20 TO 20]))
> >
> > There shouldn't be a problem using mm with edismax right? Or does the
> problem lie with the structure of my qf/pf and then adding mm?
> >
> > Cheers,
> > Claire.
> >
> > -----Original Message-----
> > From: Edward Ribeiro <edward.ribeiro@gmail.com>
> > Sent: 09 January 2020 02:28
> > To: solr-user@lucene.apache.org
> > Subject: Re: Edismax ignoring queries containing booleans
> >
> > Hi Claire,
> >
> > Unfortunately I didn't see anything in the debug explain that could
> potentially be the source of the problem. As Saurabh, I tested on a core
> and it worked for me.
> >
> > I suggest that you simplify the solrconfig (commenting out qf, mm,
> spellchecker config and pf, for example) and reload the core. If the query
> works then you  reinsert the config one by one, reloading the core and see
> if the query works.
> >
> > A few remarks based on a snippet of the solrconfig you posted on a
> previous
> > e-mail:
> >
> > * Your solrconfig.xml defines df two times (the debug shows
> "df":["text", "text"]);
> >
> > * There are a couple codes like &#x09;
> > &#x0D; and &#x0A; It would be nice to remove It;
> >
> > Please, let us know if you find why. :)
> >
> > Best,
> > Edward
> >
> >
> > Em qua, 8 de jan de 2020 13:00, Claire Pollard <claire.pollard@imagen.io
> >
> > escreveu:
> >
> > > It would be lovely to be able to use range to complete my searches,
> > > but sadly documents aren't necessarily sequential so I might want say
> > > 18, 24 or
> > > 30 in future.
> > >
> > > I've re-run the query with debug on. Is there anything here that looks
> > > unusual? Thanks.
> > >
> > > {
> > >   "responseHeader":{
> > >     "status":0,
> > >     "QTime":75,
> > >     "params":{
> > >       "mm":"\r\n       0<1 2<-1 5<-2 6<90%\r\n      ",
> > >       "spellcheck.collateExtendedResults":"true",
> > >       "df":["text",
> > >         "text"],
> > >       "q.alt":"*:*",
> > >       "ps":"100",
> > >       "spellcheck.dictionary":["default",
> > >         "wordbreak"],
> > >       "bf":"",
> > >       "echoParams":"all",
> > >       "fl":"*,score",
> > >       "spellcheck.maxCollations":"5",
> > >       "rows":"10",
> > >       "spellcheck.alternativeTermCount":"5",
> > >       "spellcheck.extendedResults":"true",
> > >       "q":"recordID:(18 OR 19 OR 20)",
> > >       "defType":"edismax",
> > >       "spellcheck.maxResultsForSuggest":"5",
> > >       "qf":"\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\ttext^0.4 recordID^10.0
> > > annotations^0.5 collectionTitle^1.9 collectionDescription^0.9
> > > title^2.0
> > > Test_FR^1.0 Test_DE^1.0 Test_AR^1.0 genre^1.0 genre_fr^1.0
> > > french2^1.0\r\n\n\t\t\t\t\n\t\t\t",
> > >       "spellcheck":"on",
> > >       "pf":"\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\ttext^0.2 recordID^10.0
> > > annotations^0.6 collectionTitle^2.0 collectionDescription^1.0
> > > title^2.1
> > > Test_FR^1.1 Test_DE^1.1 Test_AR^1.1 genre^1.1 genre_fr^1.1
> > > french2^1.1\r\n\n\t\t\t\t\n\t\t\t",
> > >       "spellcheck.count":"10",
> > >       "debugQuery":"on",
> > >       "_":"1578499092576",
> > >       "spellcheck.collate":"true"}},
> > >   "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
> > >   },
> > >   "spellcheck":{
> > >     "suggestions":[],
> > >     "correctlySpelled":false,
> > >     "collations":[]},
> > >   "debug":{
> > >     "rawquerystring":"recordID:(18 OR 19 OR 20)",
> > >     "querystring":"recordID:(18 OR 19 OR 20)",
> > >     "parsedquery":"+((recordID:[18 TO 18]) (recordID:[19 TO 19])
> > > (recordID:[20 TO 20]))~2 DisjunctionMaxQuery(((text:\"19 20\"~100)^0.2
> > > |
> > > (annotations:\"19 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0
> > > |
> > > collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> > > (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 |
> > > (Test_AR:\"19 20\"~100)^1.1))",
> > >     "parsedquery_toString":"+((recordID:[18 TO 18] recordID:[19 TO 19]
> > > recordID:[20 TO 20])~2) ((text:\"19 20\"~100)^0.2 | (annotations:\"19
> > > 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 |
> > > collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> > > (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 |
> > > (Test_AR:\"19 20\"~100)^1.1)",
> > >     "explain":{},
> > >     "QParser":"ExtendedDismaxQParser",
> > >     "altquerystring":null,
> > >     "boost_queries":null,
> > >     "parsed_boost_queries":[],
> > >     "boostfuncs":[""],
> > >     "timing":{
> > >       "time":75.0,
> > >       "prepare":{
> > >         "time":35.0,
> > >         "query":{
> > >           "time":35.0},
> > >         "facet":{
> > >           "time":0.0},
> > >         "facet_module":{
> > >           "time":0.0},
> > >         "mlt":{
> > >           "time":0.0},
> > >         "highlight":{
> > >           "time":0.0},
> > >         "stats":{
> > >           "time":0.0},
> > >         "expand":{
> > >           "time":0.0},
> > >         "terms":{
> > >           "time":0.0},
> > >         "spellcheck":{
> > >           "time":0.0},
> > >         "debug":{
> > >           "time":0.0}},
> > >       "process":{
> > >         "time":38.0,
> > >         "query":{
> > >           "time":29.0},
> > >         "facet":{
> > >           "time":0.0},
> > >         "facet_module":{
> > >           "time":0.0},
> > >         "mlt":{
> > >           "time":0.0},
> > >         "highlight":{
> > >           "time":0.0},
> > >         "stats":{
> > >           "time":0.0},
> > >         "expand":{
> > >           "time":0.0},
> > >         "terms":{
> > >           "time":0.0},
> > >         "spellcheck":{
> > >           "time":6.0},
> > >         "debug":{
> > >           "time":1.0}}}}}
> > >
> > > -----Original Message-----
> > > From: Edward Ribeiro <edward.ribeiro@gmail.com>
> > > Sent: 07 January 2020 01:05
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Edismax ignoring queries containing booleans
> > >
> > > Hi Claire,
> > >
> > > You can add the following parameter `&debug=all` on the URL to bring
> > > back debugging info and share with us (if you are using the Solr admin
> > > UI you should check the `debugQuery` checkbox).
> > >
> > > Also, if you are searching a sequence of values you could perform a
> > > range
> > > query: recordID:[18 TO 20]
> > >
> > > Best,
> > > Edward
> > >
> > > On Mon, Jan 6, 2020 at 10:46 AM Claire Pollard
> > > <claire.pollard@imagen.io>
> > > wrote:
> > > >
> > > > Ok... It doesn't work for me. I'm fairly new to Solr so any help
> > > > would be
> > > appreciated!
> > > >
> > > > My managed-schema field and field type look like this:
> > > >
> > > > <field name="recordID" type="long" indexed="true" stored="true"
> > > required="true" multiValued="false" />
> > > > <fieldType name="long" class="solr.LongPointField"
> sortMissingLast="true"
> > > omitNorms="true" />
> > > >
> > > > And my solrconfig.xml select/query handlers look like this:
> > > >
> > > >         <requestHandler name="/select" class="solr.SearchHandler">
> > > >                 <lst name="defaults">
> > > >                         <str name="echoParams">all</str>
> > > >                         <!-- Query settings -->
> > > >                         <str name="defType">edismax</str>
> > > >                         <str name="qf">
> > > >                                 &#x09;text^0.4 recordID^10.0
> > > annotations^0.5 collectionTitle^1.9 collectionDescription^0.9
> > > title^2.0
> > > Test_FR^1.0 Test_DE^1.0 Test_AR^1.0 genre^1.0 genre_fr^1.0
> > > french2^1.0&#x0D;&#x0A;
> > > >                         </str>
> > > >                         <str name="df">text</str>
> > > >                         <str name="q.alt">*:*</str>
> > > >                         <str name="rows">10</str>
> > > >                         <str name="fl">*,score</str>
> > > >                         <str name="pf">
> > > >                                 &#x09;text^0.2 recordID^10.0
> > > annotations^0.6 collectionTitle^2.0 collectionDescription^1.0
> > > title^2.1
> > > Test_FR^1.1 Test_DE^1.1 Test_AR^1.1 genre^1.1 genre_fr^1.1
> > > french2^1.1&#x0D;&#x0A;</str>
> > > >                         <str name="bf" />
> > > >                         <str name="mm">&#x0D;&#x0A;    
  0&lt;1
> 2&lt;-1
> > > 5&lt;-2 6&lt;90%&#x0D;&#x0A;      </str>
> > > >                         <int name="ps">100</int>
> > > >                         <!--SpellChecking -->
> > > >                         <str name="df">text</str>
> > > >                         <!-- Solr will use suggestions from both the
> > > 'default' spellchecker
> > > >      and from the 'wordbreak' spellchecker and combine them.
> > > >      collations (re-written queries) can include a combination of
> > > >      corrections from both spellcheckers -->
> > > >                         <str
> name="spellcheck.dictionary">default</str>
> > > >                         <str
> name="spellcheck.dictionary">wordbreak</str>
> > > >                         <str name="spellcheck">on</str>
> > > >                         <str
> name="spellcheck.extendedResults">true</str>
> > > >                         <str name="spellcheck.count">10</str>
> > > >                         <str
> > > name="spellcheck.alternativeTermCount">5</str>
> > > >                         <str
> > > name="spellcheck.maxResultsForSuggest">5</str>
> > > >                         <str name="spellcheck.collate">true</str>
> > > >                         <str
> > > name="spellcheck.collateExtendedResults">true</str>
> > > >                         <str name="spellcheck.maxCollations">5</str>
> > > >                 </lst>
> > > >                 <arr name="last-components">
> > > >                         <str>spellcheck</str>
> > > >                 </arr>
> > > >                 <!-- In addition to defaults, "appends" params can
> > > > be
> > > specified
> > > >          to identify values which should be appended to the list of
> > > >          multi-val params from the query (or the existing
> "defaults").
> > > >       -->
> > > >         </requestHandler>
> > > >
> > > >         <requestHandler name="/query" class="solr.SearchHandler">
> > > >                 <lst name="defaults">
> > > >                         <str name="echoParams">explicit</str>
> > > >                         <str name="wt">json</str>
> > > >                         <str name="indent">true</str>
> > > >                         <str name="df">text</str>
> > > >                 </lst>
> > > >         </requestHandler>
> > > >
> > > > Is there anything else that might be useful in helping diagnose
> > > > what's
> > > going wrong for me?
> > > >
> > > > Cheers,
> > > > Claire.
> > > >
> > > > -----Original Message-----
> > > > From: Saurabh Sharma <saurabh.infoedge@gmail.com>
> > > > Sent: 06 January 2020 11:20
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Edismax ignoring queries containing booleans
> > > >
> > > > It should work well. I have just tested the same with 8.3.0.
> > > >
> > > > Thanks
> > > > Saurabh Sharma
> > > >
> > > > On Mon, Jan 6, 2020, 4:31 PM Claire Pollard
> > > > <claire.pollard@imagen.io>
> > > > wrote:
> > > >
> > > > > I'm using:
> > > > >
> > > > > recordID:(18 OR 19 OR 20)
> > > > >
> > > > > Which should return 2 records (as 18 doesn't exist), but it
> > > > > returns
> > > none.
> > > > > recordID is a LongPointField (sorry I said Int in my previous
> message).
> > > > >
> > > > > -----Original Message-----
> > > > > From: Saurabh Sharma <saurabh.infoedge@gmail.com>
> > > > > Sent: 06 January 2020 10:35
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Re: Edismax ignoring queries containing booleans
> > > > >
> > > > > Please share the query which you are creating.
> > > > >
> > > > > On Mon, Jan 6, 2020, 3:52 PM Claire Pollard
> > > > > <claire.pollard@imagen.io>
> > > > > wrote:
> > > > >
> > > > > > In Solr 8.3.0 I've got an edismax query parser in my search
> > > > > > handler, and it seems to be ignoring Boolean operators such
as
> > > > > > AND and OR when searching using an IntPointField.
> > > > > >
> > > > > > I was hoping to use a query to this field to return a batch
of
> > > > > > documents with non-sequential IDs, so a range would be
> inappropriate.
> > > > > >
> > > > > > We had a previous 4.10.2 instance of Solr which uses the now
> > > > > > deprecated Trie fields, and these seem to search without issue
> > > > > > using
> > > > > boolean operators.
> > > > > >
> > > > > > Is there something extra I need to do with my setup for
> > > > > > PointFields to use booleans or should they work as default.
> > > > > >
> > > > > > Cheers,
> > > > > > Claire.
> > > > > >
> > > > >
> > >
> > >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message