lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morus Walter <>
Subject Re: Query Parser AND / OR
Date Tue, 30 Dec 2003 20:13:30 GMT
Hi Dror,

thanks for your answer. I really appreciate your comments.
> > > 
> > > Before having patches, I think it's a good idea to agree on what the
> > > "right" solution is. 
> > I tried to raise that question in the first place. But there wasn't much
> > responce.
> Might be the time of the year when many people are busy with other
> stuff.
My impression was that many people don't have a problem with this issue.
Otherwise I'd expecpt that the issue was raised earlier.

> > So I decided to make a concrete suggestion, how to change things.
> > 
> > > Most of it is obvious using boolean logic, but we
> > > have some additional requirements like not having a query that only has
> > > a NOT clause. Is this the only exception?
> > > 
> > To me the problem is, that there are two forms of queries
> > - boolean queries (a OR b AND c...)
> > - list of terms where some are flagged required and some are flagged forbidden
> >  (a +b -c ...) (in two forms: with default or and default and)
> > 
> > For each of these it seems pretty clear, what they mean, but if you start
> > to combine the two in one query, I don't know what that should mean.
> > 
> > What's the meaning of a OR b c +d ?
> > (Acutally there must be two meanings, one for default or, one for default and).
> > Maybe it's obvious, but I fail to see it.
> You're right, it is confusing. Assuming default OR I would gess that the
> above means 
> b c +d 
> and assuming default AND it would mean
> +b +c +d 
> Is there another interpretation?
You left out the 'a' which I intended to be part of the query (sorry if this
was unclear).

I was thinking about this issue, and currently I think that the only way to 
define this type of queries formally, is to give the default operator it's own
precedence relativly to the precedence of 'OR' and 'AND'.
So there are two possibilities:
either the default operator has higher precedence than 'AND' or lower than 
For default OR in the first case
`a OR b c +d' would be equal to `(a OR b) c +d' == (a b) c +d
in the second to `a OR (b c +d)' == a (b c +d) 
For default AND one has `+(a b) +c +d' and `a (+b +c +d)'

(a b) c +d searches all documents containing d, occurences of a, b and c 
influence scoring
a (b c +d) searches documents containing `a' joined with documents 
containing `d' (where b and c influcence scoring)
Now, what's closer to what one might have meant by `a OR b c +d'?

+(a b) +c +d searches documents containing c, d and either a or b.
a (+b +c +d) searches documents containing a or each of b, c and d.

The other alternative would be to forbid queries mixing default operators and
explicit and/or. This is what I'd probably vote for at the moment.

The patch doesn't implement any of these, as it handles the default operator
on the same level as AND.

> > > 
> > > As far as the actual patch, I would suspect that a better approach than
> > > using java would be to use precedence operations in the actual parser.
> > 
> > Then you decide to do a complete rewrite of the query parser.
> > That's something I wanted to avoid.
> Ouch. I think you might be right. It might be a good idea to move this
> discussion to lucene-dev where we'd get more attention from the
> developers. This seems more like a developer issue than a user issue.
Hmm. That's be up to the developers.
Don't know how many of them are reading lucene-user.

I'd prefer to keep this on the user list since the query parser is only 
loosely coupled to lucenes core, while it is strongly coupled to the users
needs. So I think the users should be included in the discussion and I think
the user list is the best place for that.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message