lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morus Walter <>
Subject RE: Query Parser AND / OR
Date Sun, 28 Dec 2003 16:24:38 GMT
Jamie Stallwood wrote:

> What Morus is saying is right, an expression without parenthesis, when
> interpreted, assumes terms on either side of an AND clause are compulsory
> terms, and any terms on either side of an OR clause are optional. However,
> if you combine AND and OR in an expression, the optional terms have no
> effect because the others are compulsory.
> What needs to be done is that the query parse should process any query
> string that has AND, and "put brackets" round it first. As it stands it is
> no use, as the OR does not work in the way you would think. AND should be
> given implicit priority.

I had a closer look at this and wrote a patch, that implements this by 
changing the vector of boolean clauses into a vector of vectors of boolean 
clauses in the addClause method of the query parser. A new sub-vector is
created whenever an explicit OR operator is used.

Queries using explicit AND/OR are grouped by precedence of AND over OR.
That is a OR b AND c gets a OR (b AND c).

Queries using implicit AND/OR (depending on the default operator) are handled
as before (so one can still use a +b -c to create one boolean query, where
b is required, c forbidden and a optional).

It's less clear how a query using both explizit AND/OR and implicit operators
should be handled.
Since the patch groups on explicit OR operators a query
a OR b c is read as a (b c)
a AND b c as +a +b c
(given that default operator or is used).

There's one issue left:
The old query parser reads  a query 
`a OR NOT b' as `a -b' which is the same as `a AND NOT b'.
The modified query parser reads this as `a (-b)'.
While this looks better (at least to me), it does not produce the result
of a OR NOT b. Instead the (-b) part seems to be silently dropped.
While I understand that this query is illegal (just searching for one negative
term) I don't think that silently dropping this part is an appropriate
way to deal with that. But I don't think that's a query parser issue.
The only question is, if the query parser should take care of that. 

I attached the patch (made against 1.3rc3 but working for 1.3final as well)
and a test program.
The test program parses a number of queries with default-or and default-and
operator and reparses the result of the toString method of the created query.
It outputs the initial query, the parsed query with default or, the reparesed
query, the parsed query with the default and it's reparsed query.
If called with a -q option, it also run's the queries against an index 
consisting of all documentes containing one or none a b c or d.
Using an unpatched and a patched version of lucene in the classpath one
can look at the effect of the patch in detail.

I'm interested in your comments. Given that noone objects the patch, I'd enter
a bug report, so it doesn't get lost.


View raw message