lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dror Matalon <d...@zapatec.com>
Subject Re: Query Parser AND / OR
Date Tue, 30 Dec 2003 21:10:31 GMT
On Tue, Dec 30, 2003 at 09:13:30PM +0100, Morus Walter wrote:
...
> > > What's the meaning of a OR b c +d ?
> > > (Acutally there must be two meanings, one for default or, one for default and).
> > > Maybe it's obvious, but I fail to see it.
> > 
> > You're right, it is confusing. Assuming default OR I would gess that the
> > above means 
> > b c +d 
> > and assuming default AND it would mean
> > +b +c +d 
> > Is there another interpretation?
> > 
> You left out the 'a' which I intended to be part of the query (sorry if this
> was unclear).

Oops, my mistake.

> 
> I was thinking about this issue, and currently I think that the only way to 
> define this type of queries formally, is to give the default operator it's own
> precedence relativly to the precedence of 'OR' and 'AND'.
> So there are two possibilities:
> either the default operator has higher precedence than 'AND' or lower than 
> 'OR'.
> For default OR in the first case
> `a OR b c +d' would be equal to `(a OR b) c +d' == (a b) c +d
> in the second to `a OR (b c +d)' == a (b c +d) 
> For default AND one has `+(a b) +c +d' and `a (+b +c +d)'
> 
> (a b) c +d searches all documents containing d, occurences of a, b and c 
> influence scoring
> a (b c +d) searches documents containing `a' joined with documents 
> containing `d' (where b and c influcence scoring)
> Now, what's closer to what one might have meant by `a OR b c +d'?
> 
> +(a b) +c +d searches documents containing c, d and either a or b.
> a (+b +c +d) searches documents containing a or each of b, c and d.

I don't think this is a good idea. Mostly because it would be hard to
explain/document, and you don't want end users to have to think and read
a lot of documentation when doing a search.

For one thing, I would advocate for using the '+' notation as the
underlying syntax and migrating to boolean operators since that's many
more people are used to that syntax, and I believe it's better
understood.

> 
> The other alternative would be to forbid queries mixing default operators and
> explicit and/or. This is what I'd probably vote for at the moment.

At first I was inclined to agree but as a rule I think we should adopt
the WWGD (What Would Google Do) philosophy, since that's the syntax and
behavior that most people are used to.

It looks like it basically adds an "AND" between any two terms that
don't have operator between them. We could do the same for both the
default AND and the default OR. Once you've done that, you just use the
standard boolean logic precedence rule.

Now the good news on all of this is that it seems (I did a small test),
that if you use parenthesis the parser does the right thing. In my mind,
it's a good idea to use parenthesis whenever you're creating complex
expressions.

> 
> The patch doesn't implement any of these, as it handles the default operator
> on the same level as AND.
> 
> > > > 
> > > > As far as the actual patch, I would suspect that a better approach than
> > > > using java would be to use precedence operations in the actual parser.
> > > 
> > > Then you decide to do a complete rewrite of the query parser.
> > > That's something I wanted to avoid.
> > 
> > Ouch. I think you might be right. It might be a good idea to move this
> > discussion to lucene-dev where we'd get more attention from the
> > developers. This seems more like a developer issue than a user issue.
> > 
> Hmm. That's be up to the developers.
> Don't know how many of them are reading lucene-user.
> 
> I'd prefer to keep this on the user list since the query parser is only 
> loosely coupled to lucenes core, while it is strongly coupled to the users
> needs. So I think the users should be included in the discussion and I think
> the user list is the best place for that.
> 

And Erik indicated that they're here anyway, so it's fine.

Regards,

Dror

> Morus
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message