lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Ribeiro <edward.ribe...@gmail.com>
Subject Re: Edismax ignoring queries containing booleans
Date Fri, 10 Jan 2020 04:41:00 GMT
Hi Claire,

> The only visual difference I think is the ~2 which came after the initial
part of the parsed query:
> Old Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
(recordID:[20 TO 20]))~2
> New Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
(recordID:[20 TO 20]))

The mm (minimum match) parameter alter the behaviour of the OR clauses. See
here:
https://lucene.apache.org/solr/guide/8_3/the-dismax-query-parser.html#mm-minimum-should-match-parameter
For example, if there is a query like `text:(toys OR children OR sales)`,
but your mm=3, then at least three terms are required to match. The query
is now equivalent to `text:(toys AND children AND sales)`

In the "+((recordID:[18 TO 18]) (recordID:[19 TO 19]) (recordID:[20 TO
20]))~2" query the "))~2" part means that at least two matches are required
of the three optional terms: 18, 19, and 20. But recordID will only match
at most one term. Therefore, it will return no documents because it will
never satisfy the condition setup by mm (match 18 AND 19 AND 20). If mm=1
the query would work as intended in this example.

The mm parameter you use is: 0<1 2<-1 5<-2 6<90% that can roughly be
translated as:

* 0<1 : If there is one term then minimum match 1??? Didn't get this one.

* 2<-1 5<-2 6<90% : If there are one or two terms then mininum match all.
Between 3 and 5 (inclusive) terms match all but one (in your example there
are 3 numbers so it will require to match at least 2, that’s the reason of
the ~2). If there are 6 terms then match 4 (6 - 2), and above 6 terms then
matches 90% of the terms (e.g., if there are 10 clauses then it is required
to match at least 9).

> There shouldn't be a problem using mm with edismax right? Or does the
problem lie with the structure of my qf/pf and then adding mm?

Nope. There’s no problem using mm with edismax nor the problem lies on
qf/pf. As you dig

> I can see this is a change to default behaviour, but does it mean I
should be passing mm in the query now rather than just at config level?

I see a couple of approaches to solve this issue:

1) Removing the mm parameter from solrconfig. But it probably was setup for
a reason so you should check before hand. In this case, you could issue
mm=0<1 2<-1 5<-2 6<90% as a query parameter if necessary.

2) Adding a mm=1 as a query parameter whenever you search for recordID.
Issuing the parameter in the query will overwrite the mm parameter that was
setup in solrconfig for that particular query.

3) Doing a match all query (q=*:*) and moving the recordID query to a
filter query: fq=recordID:(18 OR 19 OR 20)  The fq is not affected by mm
parameter or so it seems. No need to change mm in solrconfig nor adding mm
as a query parameter.

Particularly, I would go with either 2) or 3).

Best,
Edward

On Thu, Jan 9, 2020 at 7:47 AM Claire Pollard <claire.pollard@imagen.io>
wrote:
>
> Also, I've found this bug from previous which highlights the issue with
))~2
>
> https://issues.apache.org/jira/browse/SOLR-8812
>
> mm is set at config, but not explicitly in the query...
>
> I can see this is a change to default behaviour, but does it mean I
should be passing mm in the query now rather than just at config level?
>
> -----Original Message-----
> From: Claire Pollard <claire.pollard@imagen.io>
> Sent: 09 January 2020 10:23
> To: solr-user@lucene.apache.org
> Subject: RE: Edismax ignoring queries containing booleans
>
> Hey Edward,
>
> Thanks for the tips.
>
> I've cleaned up my solrconfig, removed the duplicate df, tabs and
newlines, and tried commenting out the bits you've suggested and adding
them back in bit by bit, and it seems mm was the thing which is breaking
the query for me.
>
> Without it, the query returns 2 documents as expected.
>
> "debug":{
>     "rawquerystring":"recordID:(18 OR 19 OR 20)",
>     "querystring":"recordID:(18 OR 19 OR 20)",
>     "parsedquery":"+((recordID:[18 TO 18]) (recordID:[19 TO 19])
(recordID:[20 TO 20])) DisjunctionMaxQuery(((text:\"19 20\"~100)^0.2 |
(annotations:\"19 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 |
collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
(Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | (Test_AR:\"19
20\"~100)^1.1))",
>     "parsedquery_toString":"+(recordID:[18 TO 18] recordID:[19 TO 19]
recordID:[20 TO 20]) ((text:\"19 20\"~100)^0.2 | (annotations:\"19
20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 |
collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
(Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 | (Test_AR:\"19
20\"~100)^1.1)",
>     "explain":{
>       "2CBF8A49-CA2D-4e42-88F2-3790922EF415":"\n1.0 = sum of:\n  1.0 =
sum of:\n    1.0 = recordID:[19 TO 19]\n",
>       "F73CFBC7-2CD2-4aab-B8C1-9D19D427EAFB":"\n1.0 = sum of:\n  1.0 =
sum of:\n    1.0 = recordID:[20 TO 20]\n"},
>
> The only visual difference I think is the ~2 which came after the initial
part of the parsed query:
>
> Old Query start: +((recordID:[18 TO 18]) (recordID:[19 TO 19])
(recordID:[20 TO 20]))~2 New Query start: +((recordID:[18 TO 18])
(recordID:[19 TO 19]) (recordID:[20 TO 20]))
>
> There shouldn't be a problem using mm with edismax right? Or does the
problem lie with the structure of my qf/pf and then adding mm?
>
> Cheers,
> Claire.
>
> -----Original Message-----
> From: Edward Ribeiro <edward.ribeiro@gmail.com>
> Sent: 09 January 2020 02:28
> To: solr-user@lucene.apache.org
> Subject: Re: Edismax ignoring queries containing booleans
>
> Hi Claire,
>
> Unfortunately I didn't see anything in the debug explain that could
potentially be the source of the problem. As Saurabh, I tested on a core
and it worked for me.
>
> I suggest that you simplify the solrconfig (commenting out qf, mm,
spellchecker config and pf, for example) and reload the core. If the query
works then you  reinsert the config one by one, reloading the core and see
if the query works.
>
> A few remarks based on a snippet of the solrconfig you posted on a
previous
> e-mail:
>
> * Your solrconfig.xml defines df two times (the debug shows "df":["text",
"text"]);
>
> * There are a couple codes like &#x09;
> &#x0D; and &#x0A; It would be nice to remove It;
>
> Please, let us know if you find why. :)
>
> Best,
> Edward
>
>
> Em qua, 8 de jan de 2020 13:00, Claire Pollard <claire.pollard@imagen.io>
> escreveu:
>
> > It would be lovely to be able to use range to complete my searches,
> > but sadly documents aren't necessarily sequential so I might want say
> > 18, 24 or
> > 30 in future.
> >
> > I've re-run the query with debug on. Is there anything here that looks
> > unusual? Thanks.
> >
> > {
> >   "responseHeader":{
> >     "status":0,
> >     "QTime":75,
> >     "params":{
> >       "mm":"\r\n       0<1 2<-1 5<-2 6<90%\r\n      ",
> >       "spellcheck.collateExtendedResults":"true",
> >       "df":["text",
> >         "text"],
> >       "q.alt":"*:*",
> >       "ps":"100",
> >       "spellcheck.dictionary":["default",
> >         "wordbreak"],
> >       "bf":"",
> >       "echoParams":"all",
> >       "fl":"*,score",
> >       "spellcheck.maxCollations":"5",
> >       "rows":"10",
> >       "spellcheck.alternativeTermCount":"5",
> >       "spellcheck.extendedResults":"true",
> >       "q":"recordID:(18 OR 19 OR 20)",
> >       "defType":"edismax",
> >       "spellcheck.maxResultsForSuggest":"5",
> >       "qf":"\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\ttext^0.4 recordID^10.0
> > annotations^0.5 collectionTitle^1.9 collectionDescription^0.9
> > title^2.0
> > Test_FR^1.0 Test_DE^1.0 Test_AR^1.0 genre^1.0 genre_fr^1.0
> > french2^1.0\r\n\n\t\t\t\t\n\t\t\t",
> >       "spellcheck":"on",
> >       "pf":"\n\t\t\t\t\n\t\t\t\t\n\t\t\t\t\ttext^0.2 recordID^10.0
> > annotations^0.6 collectionTitle^2.0 collectionDescription^1.0
> > title^2.1
> > Test_FR^1.1 Test_DE^1.1 Test_AR^1.1 genre^1.1 genre_fr^1.1
> > french2^1.1\r\n\n\t\t\t\t\n\t\t\t",
> >       "spellcheck.count":"10",
> >       "debugQuery":"on",
> >       "_":"1578499092576",
> >       "spellcheck.collate":"true"}},
> >   "response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
> >   },
> >   "spellcheck":{
> >     "suggestions":[],
> >     "correctlySpelled":false,
> >     "collations":[]},
> >   "debug":{
> >     "rawquerystring":"recordID:(18 OR 19 OR 20)",
> >     "querystring":"recordID:(18 OR 19 OR 20)",
> >     "parsedquery":"+((recordID:[18 TO 18]) (recordID:[19 TO 19])
> > (recordID:[20 TO 20]))~2 DisjunctionMaxQuery(((text:\"19 20\"~100)^0.2
> > |
> > (annotations:\"19 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0
> > |
> > collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> > (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 |
> > (Test_AR:\"19 20\"~100)^1.1))",
> >     "parsedquery_toString":"+((recordID:[18 TO 18] recordID:[19 TO 19]
> > recordID:[20 TO 20])~2) ((text:\"19 20\"~100)^0.2 | (annotations:\"19
> > 20\"~100)^0.6 | (collectionTitle:\"19 20\"~100)^2.0 |
> > collectionDescription:\"19 20\"~100 | (title:\"19 20\"~100)^2.1 |
> > (Test_FR:\"19 20\"~100)^1.1 | (Test_DE:\"19 20\"~100)^1.1 |
> > (Test_AR:\"19 20\"~100)^1.1)",
> >     "explain":{},
> >     "QParser":"ExtendedDismaxQParser",
> >     "altquerystring":null,
> >     "boost_queries":null,
> >     "parsed_boost_queries":[],
> >     "boostfuncs":[""],
> >     "timing":{
> >       "time":75.0,
> >       "prepare":{
> >         "time":35.0,
> >         "query":{
> >           "time":35.0},
> >         "facet":{
> >           "time":0.0},
> >         "facet_module":{
> >           "time":0.0},
> >         "mlt":{
> >           "time":0.0},
> >         "highlight":{
> >           "time":0.0},
> >         "stats":{
> >           "time":0.0},
> >         "expand":{
> >           "time":0.0},
> >         "terms":{
> >           "time":0.0},
> >         "spellcheck":{
> >           "time":0.0},
> >         "debug":{
> >           "time":0.0}},
> >       "process":{
> >         "time":38.0,
> >         "query":{
> >           "time":29.0},
> >         "facet":{
> >           "time":0.0},
> >         "facet_module":{
> >           "time":0.0},
> >         "mlt":{
> >           "time":0.0},
> >         "highlight":{
> >           "time":0.0},
> >         "stats":{
> >           "time":0.0},
> >         "expand":{
> >           "time":0.0},
> >         "terms":{
> >           "time":0.0},
> >         "spellcheck":{
> >           "time":6.0},
> >         "debug":{
> >           "time":1.0}}}}}
> >
> > -----Original Message-----
> > From: Edward Ribeiro <edward.ribeiro@gmail.com>
> > Sent: 07 January 2020 01:05
> > To: solr-user@lucene.apache.org
> > Subject: Re: Edismax ignoring queries containing booleans
> >
> > Hi Claire,
> >
> > You can add the following parameter `&debug=all` on the URL to bring
> > back debugging info and share with us (if you are using the Solr admin
> > UI you should check the `debugQuery` checkbox).
> >
> > Also, if you are searching a sequence of values you could perform a
> > range
> > query: recordID:[18 TO 20]
> >
> > Best,
> > Edward
> >
> > On Mon, Jan 6, 2020 at 10:46 AM Claire Pollard
> > <claire.pollard@imagen.io>
> > wrote:
> > >
> > > Ok... It doesn't work for me. I'm fairly new to Solr so any help
> > > would be
> > appreciated!
> > >
> > > My managed-schema field and field type look like this:
> > >
> > > <field name="recordID" type="long" indexed="true" stored="true"
> > required="true" multiValued="false" />
> > > <fieldType name="long" class="solr.LongPointField"
sortMissingLast="true"
> > omitNorms="true" />
> > >
> > > And my solrconfig.xml select/query handlers look like this:
> > >
> > >         <requestHandler name="/select" class="solr.SearchHandler">
> > >                 <lst name="defaults">
> > >                         <str name="echoParams">all</str>
> > >                         <!-- Query settings -->
> > >                         <str name="defType">edismax</str>
> > >                         <str name="qf">
> > >                                 &#x09;text^0.4 recordID^10.0
> > annotations^0.5 collectionTitle^1.9 collectionDescription^0.9
> > title^2.0
> > Test_FR^1.0 Test_DE^1.0 Test_AR^1.0 genre^1.0 genre_fr^1.0
> > french2^1.0&#x0D;&#x0A;
> > >                         </str>
> > >                         <str name="df">text</str>
> > >                         <str name="q.alt">*:*</str>
> > >                         <str name="rows">10</str>
> > >                         <str name="fl">*,score</str>
> > >                         <str name="pf">
> > >                                 &#x09;text^0.2 recordID^10.0
> > annotations^0.6 collectionTitle^2.0 collectionDescription^1.0
> > title^2.1
> > Test_FR^1.1 Test_DE^1.1 Test_AR^1.1 genre^1.1 genre_fr^1.1
> > french2^1.1&#x0D;&#x0A;</str>
> > >                         <str name="bf" />
> > >                         <str name="mm">&#x0D;&#x0A;       0&lt;1
2&lt;-1
> > 5&lt;-2 6&lt;90%&#x0D;&#x0A;      </str>
> > >                         <int name="ps">100</int>
> > >                         <!--SpellChecking -->
> > >                         <str name="df">text</str>
> > >                         <!-- Solr will use suggestions from both the
> > 'default' spellchecker
> > >      and from the 'wordbreak' spellchecker and combine them.
> > >      collations (re-written queries) can include a combination of
> > >      corrections from both spellcheckers -->
> > >                         <str
name="spellcheck.dictionary">default</str>
> > >                         <str
name="spellcheck.dictionary">wordbreak</str>
> > >                         <str name="spellcheck">on</str>
> > >                         <str
name="spellcheck.extendedResults">true</str>
> > >                         <str name="spellcheck.count">10</str>
> > >                         <str
> > name="spellcheck.alternativeTermCount">5</str>
> > >                         <str
> > name="spellcheck.maxResultsForSuggest">5</str>
> > >                         <str name="spellcheck.collate">true</str>
> > >                         <str
> > name="spellcheck.collateExtendedResults">true</str>
> > >                         <str name="spellcheck.maxCollations">5</str>
> > >                 </lst>
> > >                 <arr name="last-components">
> > >                         <str>spellcheck</str>
> > >                 </arr>
> > >                 <!-- In addition to defaults, "appends" params can
> > > be
> > specified
> > >          to identify values which should be appended to the list of
> > >          multi-val params from the query (or the existing "defaults").
> > >       -->
> > >         </requestHandler>
> > >
> > >         <requestHandler name="/query" class="solr.SearchHandler">
> > >                 <lst name="defaults">
> > >                         <str name="echoParams">explicit</str>
> > >                         <str name="wt">json</str>
> > >                         <str name="indent">true</str>
> > >                         <str name="df">text</str>
> > >                 </lst>
> > >         </requestHandler>
> > >
> > > Is there anything else that might be useful in helping diagnose
> > > what's
> > going wrong for me?
> > >
> > > Cheers,
> > > Claire.
> > >
> > > -----Original Message-----
> > > From: Saurabh Sharma <saurabh.infoedge@gmail.com>
> > > Sent: 06 January 2020 11:20
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Edismax ignoring queries containing booleans
> > >
> > > It should work well. I have just tested the same with 8.3.0.
> > >
> > > Thanks
> > > Saurabh Sharma
> > >
> > > On Mon, Jan 6, 2020, 4:31 PM Claire Pollard
> > > <claire.pollard@imagen.io>
> > > wrote:
> > >
> > > > I'm using:
> > > >
> > > > recordID:(18 OR 19 OR 20)
> > > >
> > > > Which should return 2 records (as 18 doesn't exist), but it
> > > > returns
> > none.
> > > > recordID is a LongPointField (sorry I said Int in my previous
message).
> > > >
> > > > -----Original Message-----
> > > > From: Saurabh Sharma <saurabh.infoedge@gmail.com>
> > > > Sent: 06 January 2020 10:35
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Edismax ignoring queries containing booleans
> > > >
> > > > Please share the query which you are creating.
> > > >
> > > > On Mon, Jan 6, 2020, 3:52 PM Claire Pollard
> > > > <claire.pollard@imagen.io>
> > > > wrote:
> > > >
> > > > > In Solr 8.3.0 I've got an edismax query parser in my search
> > > > > handler, and it seems to be ignoring Boolean operators such as
> > > > > AND and OR when searching using an IntPointField.
> > > > >
> > > > > I was hoping to use a query to this field to return a batch of
> > > > > documents with non-sequential IDs, so a range would be
inappropriate.
> > > > >
> > > > > We had a previous 4.10.2 instance of Solr which uses the now
> > > > > deprecated Trie fields, and these seem to search without issue
> > > > > using
> > > > boolean operators.
> > > > >
> > > > > Is there something extra I need to do with my setup for
> > > > > PointFields to use booleans or should they work as default.
> > > > >
> > > > > Cheers,
> > > > > Claire.
> > > > >
> > > >
> >
> >

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message