lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From david.dav...@correo.aeat.es
Subject Re: Problem with queries that includes NOT
Date Thu, 26 Feb 2015 07:59:59 GMT
Hi,

I thought that we were using the edismax query parser, but it seems that 
we had configured the dismax parser.
I have made some tests with the edismax parser and it works fine, so I'll 
change it in our production Solr.

Regards,

David Dávila
DIT - 915828763




De:     Alvaro Cabrerizo <toporniz@gmail.com>
Para:   "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>, 
Fecha:  25/02/2015 16:41
Asunto: Re: Problem with queries that includes NOT



Hi,

The edismax parser should be able to manage the query you want to ask. 
I've
made a test and the next both queries give me the right result (see the
parenthesis):

   - {!edismax}(NOT id:99997 AND NOT id:99998  AND id:99999)   (gives 1 
hit
   the id:99999)
   - {!edismax}((NOT id:99997 AND NOT id:99998)  AND id:99999) (gives 1 
hit
   the id:99999)

In general, the issue appears when using the lucene query parser mixing
different boolean clauses (including NOT). Thus, as you commented, the 
next
queries gives different result


   - NOT id:99997 AND NOT id:99998  AND id:99999   (gives 1 hit the
   id:99999)
   - (NOT id:99997 AND NOT id:99998)  AND id:99999 (gives 0 hits when
   expecting 1 )

Since I read the chapter "Limitations of prohibited clauses in 
sub-queries"
from the "Apache Solr 3 Enterprise Search Server" many years ago,  I 
always
add the *all documents query clause *:**  to the negative clauses to avoid
the problem you mentioned. Thus I will recommend to rewrite the query you
showed us as:

   - (**:*: AND* NOT Proc:"ID01" AND NOT FileType:PDF_TEXT) AND
   sys_FileType:PROTOTIPE
   - (NOT id:99997 AND NOT id:99998 *AND *:**)  AND id:99999 (gives 1 hit
   as expected)

The above query can be read then as give me all the documents except those
having ID01 and PDF_TEXT and having PROTOTIPE

Regards.




On Wed, Feb 25, 2015 at 1:23 PM, Shawn Heisey <apache@elyograg.org> wrote:

> On 2/25/2015 4:04 AM, david.davila@correo.aeat.es wrote:
> > We have problems with some queries. All of them include the tag NOT, 
and
> > in my opinion, the results don´t make any sense.
> >
> > First problem:
> >
> > This query " NOT Proc:ID01 "   returns   95806 results, however this 
one
> "
> > NOT Proc:ID01 OR FileType:PDF_TEXT" returns  11484 results. But it's
> > impossible that adding a tag OR the query has less number of results.
> >
> > Second problem. Here the problem is because of the brackets and the 
NOT
> > tag:
> >
> >  This query:
> >
> > (NOT Proc:"ID01" AND NOT FileType:PDF_TEXT) AND sys_FileType:PROTOTIPE
> > returns 0 documents.
> >
> > But this query:
> >
> > (NOT Proc:"ID01" AND NOT FileType:PDF_TEXT AND sys_FileType:PROTOTIPE)
> > returns 53 documents, which is correct. So, the problem is the 
position
> of
> > the bracket. I have checked the same query without NOTs, and it works
> fine
> > returning the same number of results in both cases.  So, I think the
> > problem is the combination of the bracket positions and the NOT tag.
>
> For the first query, there is a difference between "NOT condition1 OR
> condition2" and "NOT (condition1 OR condition2)" ... I can imagine the
> first one increasing the document count compared to just "NOT
> condition1" ... the second one wouldn't increase it.
>
> Boolean queries in Solr (and very likely Lucene as well) do not always
> do what people expect.
>
> http://robotlibrarian.billdueber.com/2011/12/solr-and-boolean-operators/
> https://lucidworks.com/blog/why-not-and-or-and-not/
>
> As mentioned in the second link above, you'll get better results if you
> use the prefix operators with explicit parentheses.  One word of
> warning, though -- the prefix operators do not work correctly if you
> change the default operator to AND.
>
> Thanks,
> Shawn
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message