lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-933) QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for input
Date Wed, 20 Jun 2007 22:21:26 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506703
] 

Doron Cohen commented on LUCENE-933:
------------------------------------

So an acceptable solution is:
  Query parser will ignore empty clauses (e.g. ' ( ) ' ) resulted from words filtering, the
same as it already does for single words. 

A straightforward fix is for QueryParser to avoid adding null (inner) queries into (outer)
clauses sets. (It makes sense, too.)

However this has a side effect: 
  For queries that became "empty" as result of filtering (stopping), QueryParser would now
return null. 

This is an API semantics change, because applications that used to get a BooleanQuery with
0 clauses as parse result, would now get a null query. 

Here is a closer look on the behavior change:

Original behavior:
   (1)  parse(" ")  == ParseException
   (2)  parse("( )")  == ParseException
   (3)  parse("stop") == " "    
        (actually a boolean query with 0 clauses)
   (4)  parse("(stop)")  == " "    
        (actually a boolean query with 0 clauses)
   (5)  parse("a stop b") == "a b"
   (6)  parse("a (stop) b") == "a () b"   
        (middle part is a boolean query with 0 clauses)
   (7)  parse("a ((stop)) b") == "a () b" 
        (again middle part is a boolean query with 0 clauses)

Modified behavior:   
   (3)  parse("stop") == null
   (4)  parse("(stop)")  == null    
   (6)  parse("a (stop) b") == "a b"   
   (7)  parse("a ((stop)) b") == "a b" 

I think the modified behavior is the right one - applications can test a query for being null
and realize that it is a no-op. 

However backwards compatibility is important - would this change break existing applications
with annoying new NPEs?

As an alternative, QueryParser parse() methods can be modified to return a phony empty BQ
instead of returning null, for the sake of backwards compatibility.

Thoughts?

> QueryParser can produce empty sub BooleanQueries when Analyzer proudces no tokens for
input
> -------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-933
>                 URL: https://issues.apache.org/jira/browse/LUCENE-933
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Hoss Man
>            Assignee: Doron Cohen
>
> as triggered by SOLR-261, if you have a query like this...
>    +foo:BBB  +(yak:AAA  baz:CCC)
> ...where the analyzer produces no tokens for the "yak:AAA" or "baz:CCC" portions of the
query (posisbly because they are stop words) the resulting query produced by the QueryParser
will be...
>   +foo:BBB +()
> ...that is a BooleanQuery with two required clauses, one of which is an empty BooleanQuery
with no clauses.
> this does not appear to be "good" behavior.
> In general, QueryParser should be smarter about what it does when parsing encountering
parens whose contents result in an empty BooleanQuery -- but what exactly it should do in
the following situations...
>  a)  +foo:BBB +()
>  b)  +foo:BBB ()
>  c)  +foo:BBB -()
> ...is up for interpretation.  I would think situation (b) clearly lends itself to dropping
the sub-BooleanQuery completely.  situation (c) may also lend itself to that solution, since
semanticly it means "don't allow a match on any queries in the empty set of queries".  ....
I have no idea what the "right" thing to do for situation (a) is.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message