Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 80107 invoked from network); 30 Dec 2003 21:10:42 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 30 Dec 2003 21:10:42 -0000 Received: (qmail 42970 invoked by uid 500); 30 Dec 2003 21:10:26 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 42944 invoked by uid 500); 30 Dec 2003 21:10:26 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 42931 invoked from network); 30 Dec 2003 21:10:25 -0000 Received: from unknown (HELO five.zapatec.com) (66.117.150.100) by daedalus.apache.org with SMTP; 30 Dec 2003 21:10:25 -0000 Received: from rlx11.zapatec.com (rlx11.pr.zapatec.com [192.168.1.132]) by five.zapatec.com (Postfix) with ESMTP id AE2225D22 for ; Tue, 30 Dec 2003 13:10:31 -0800 (PST) Received: (from dror@localhost) by rlx11.zapatec.com (8.12.3/8.12.3/Submit) id hBULAVxo016960 for lucene-user@jakarta.apache.org; Tue, 30 Dec 2003 13:10:31 -0800 (PST) (envelope-from dror) Date: Tue, 30 Dec 2003 13:10:31 -0800 From: Dror Matalon To: Lucene Users List Subject: Re: Query Parser AND / OR Message-ID: <20031230211031.GZ95420@rlx11.zapatec.com> References: <16367.966.191362.389052@onlinehome.de> <16367.2273.546534.719218@onlinehome.de> <88DD6FD3-3993-11D8-AAFB-000393A564E6@ehatchersolutions.com> <20031229200724.GP95420@rlx11.zapatec.com> <16369.22033.684616.614235@onlinehome.de> <20031230183558.GV95420@rlx11.zapatec.com> <16369.56426.665028.54190@onlinehome.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <16369.56426.665028.54190@onlinehome.de> X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Tue, Dec 30, 2003 at 09:13:30PM +0100, Morus Walter wrote: ... > > > What's the meaning of a OR b c +d ? > > > (Acutally there must be two meanings, one for default or, one for default and). > > > Maybe it's obvious, but I fail to see it. > > > > You're right, it is confusing. Assuming default OR I would gess that the > > above means > > b c +d > > and assuming default AND it would mean > > +b +c +d > > Is there another interpretation? > > > You left out the 'a' which I intended to be part of the query (sorry if this > was unclear). Oops, my mistake. > > I was thinking about this issue, and currently I think that the only way to > define this type of queries formally, is to give the default operator it's own > precedence relativly to the precedence of 'OR' and 'AND'. > So there are two possibilities: > either the default operator has higher precedence than 'AND' or lower than > 'OR'. > For default OR in the first case > `a OR b c +d' would be equal to `(a OR b) c +d' == (a b) c +d > in the second to `a OR (b c +d)' == a (b c +d) > For default AND one has `+(a b) +c +d' and `a (+b +c +d)' > > (a b) c +d searches all documents containing d, occurences of a, b and c > influence scoring > a (b c +d) searches documents containing `a' joined with documents > containing `d' (where b and c influcence scoring) > Now, what's closer to what one might have meant by `a OR b c +d'? > > +(a b) +c +d searches documents containing c, d and either a or b. > a (+b +c +d) searches documents containing a or each of b, c and d. I don't think this is a good idea. Mostly because it would be hard to explain/document, and you don't want end users to have to think and read a lot of documentation when doing a search. For one thing, I would advocate for using the '+' notation as the underlying syntax and migrating to boolean operators since that's many more people are used to that syntax, and I believe it's better understood. > > The other alternative would be to forbid queries mixing default operators and > explicit and/or. This is what I'd probably vote for at the moment. At first I was inclined to agree but as a rule I think we should adopt the WWGD (What Would Google Do) philosophy, since that's the syntax and behavior that most people are used to. It looks like it basically adds an "AND" between any two terms that don't have operator between them. We could do the same for both the default AND and the default OR. Once you've done that, you just use the standard boolean logic precedence rule. Now the good news on all of this is that it seems (I did a small test), that if you use parenthesis the parser does the right thing. In my mind, it's a good idea to use parenthesis whenever you're creating complex expressions. > > The patch doesn't implement any of these, as it handles the default operator > on the same level as AND. > > > > > > > > > As far as the actual patch, I would suspect that a better approach than > > > > using java would be to use precedence operations in the actual parser. > > > > > > Then you decide to do a complete rewrite of the query parser. > > > That's something I wanted to avoid. > > > > Ouch. I think you might be right. It might be a good idea to move this > > discussion to lucene-dev where we'd get more attention from the > > developers. This seems more like a developer issue than a user issue. > > > Hmm. That's be up to the developers. > Don't know how many of them are reading lucene-user. > > I'd prefer to keep this on the user list since the query parser is only > loosely coupled to lucenes core, while it is strongly coupled to the users > needs. So I think the users should be included in the discussion and I think > the user list is the best place for that. > And Erik indicated that they're here anyway, so it's fine. Regards, Dror > Morus > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org