lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dror Matalon <>
Subject Re: Query Parser AND / OR
Date Tue, 30 Dec 2003 23:37:37 GMT
On Tue, Dec 30, 2003 at 11:19:38PM +0100, Morus Walter wrote:
> Hi Dror,
> > 
> > For one thing, I would advocate for using the '+' notation as the
> > underlying syntax and migrating to boolean operators since that's many
> > more people are used to that syntax, and I believe it's better
> > understood.
> > 
> I'm not sure if I understand what you mean here.

I meant that the queryparse would accept AND and OR which get translated
into '+' and '-' but does not accept the '+' and '-' directly.

> > > 
> > > The other alternative would be to forbid queries mixing default operators and
> > > explicit and/or. This is what I'd probably vote for at the moment.
> > 
> > At first I was inclined to agree but as a rule I think we should adopt
> > the WWGD (What Would Google Do) philosophy, since that's the syntax and
> > behavior that most people are used to.
> > 
> > It looks like it basically adds an "AND" between any two terms that
> > don't have operator between them. We could do the same for both the
> > default AND and the default OR. Once you've done that, you just use the
> > standard boolean logic precedence rule.
> > 
> Hmm. Then you loose the possibility to create BooleanQuery-objects where
> some of the terms are required some forbidden and some have neither flag.
> To have this possibility is the reason why I say that implicit AND/OR and 
> explicit AND/OR need to be different things.
> If an implicit OR equals an explicit OR, you would have '+a +b' = '+a OR +b' 
> = '(+a) OR (+b)' = 'a OR b' which is probably not, what was intended.
> So either the '+' operator is removed or it is used as an alternative to AND
> in which case it could not be a prefix. So instead of '+a +b' one would use 
> 'a + b'.

Which is my point above. It's too confusing to have:
1. '+' and '-'
2. Explict AND and OR
3. Implict AND or OR

There's some redundancy between all three, and it's quite easy to get

> A consequence of pure boolean operators is, that there won't be a way of 
> serializing an arbitray query to a parsable string in standard query parser 
> syntax.
> So for completeness and compatibility with the current query parser, I would 
> keep the current behaviour of queries without explicit boolean operators.
> The problem for users isn't that big IMHO.
> Unless a user decides to make use of the '+' operator things are pretty clear:
> a b c searches for documents containing one or all of these terms (depending
> on the default operator). Using terms with the '-' operator also does what 
> one expects. Only if the user starts to use the '+' operator explicitly,
> things are getting more complicated. So he just shouldn't do that unless
> he knows what he does.

Fair enough.

> The same thing applies to queries using AND/OR as long as you don't mix it
> with implicit operators. IMO whoever does the latter get's what he deserves,
> if he has to deal with the difficulties of such queries. One just should
> not do that, and it should be pretty clear, that the meaning of such a query
> is unclear (unless parenthesis are used, in which case there is no mixing
> any longer).
> That is, why I think my patch is good enough, even if it leaves the evaluation
> of such queries without clear definition.

I guess I can be convinced. Clearly things are broken, and clearly if
your patch works as advertised, it should make things better rather than
worse. And a partial solution is better than no solution. So, if the
developers bless the patch, run it through the test suite and it comes
out looking good, I'm for it.

Again, thanks for spending the time on this.



> > Now the good news on all of this is that it seems (I did a small test),
> > that if you use parenthesis the parser does the right thing. In my mind,
> > it's a good idea to use parenthesis whenever you're creating complex
> > expressions.
> > 
> Sure. All we are talking about is what happens if there are no explicit
> parenthesis. If you use parentheses you break the query into simple parts 
> (e.g. (a AND b) OR (c AND d) are two queries of type 'x AND y' and one
> query of typ 'x OR y' (where x and y are queries, not just terms)), which
> are handled correctly even by the current query parser.
> That's one of the reasons, why this hasn't been a big problem in the past.
> If you use (a AND b) OR (c AND d) you will get what you expect.
> It's just that I think the query parser should also create a reasonable 
> query if the parenthesis are removed.
> Morus
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message