Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 28174 invoked from network); 8 Mar 2005 08:46:11 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 8 Mar 2005 08:46:11 -0000 Received: (qmail 17016 invoked by uid 500); 8 Mar 2005 08:46:05 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 16985 invoked by uid 500); 8 Mar 2005 08:46:05 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 16970 invoked by uid 99); 8 Mar 2005 08:46:05 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from Unknown (HELO ehatchersolutions.com) (69.55.225.129) by apache.org (qpsmtpd/0.28) with ESMTP; Tue, 08 Mar 2005 00:46:03 -0800 Received: by ehatchersolutions.com (Postfix, from userid 504) id 6022B13E200A; Tue, 8 Mar 2005 03:45:59 -0500 (EST) Received: from [192.168.1.100] (va-chrvlle-cad1-bdgrp1-4b-b-169.chvlva.adelphia.net [68.169.41.169]) by ehatchersolutions.com (Postfix) with ESMTP id DC28F13E2006 for ; Tue, 8 Mar 2005 03:45:55 -0500 (EST) Mime-Version: 1.0 (Apple Message framework v619.2) In-Reply-To: <16941.21568.532376.650196@tanto-xipolis.de> References: <16940.47718.444427.724315@onlinehome.de> <9ee8730d005921ac5d712bef8d2af0b1@ehatchersolutions.com> <16941.21568.532376.650196@tanto-xipolis.de> Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <76660b51560499b1f343a32fd7d99adc@ehatchersolutions.com> Content-Transfer-Encoding: 7bit From: Erik Hatcher Subject: Re: QueryParser refactoring Date: Tue, 8 Mar 2005 03:45:49 -0500 To: java-user@lucene.apache.org X-Mailer: Apple Mail (2.619.2) X-Spam-Checker-Version: SpamAssassin 3.0.1 (2004-10-22) on javelina X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00, RCVD_IN_NJABL_DUL,RCVD_IN_SORBS_DUL autolearn=no version=3.0.1 X-Spam-Level: X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Mar 8, 2005, at 2:29 AM, Morus Walter wrote: > Erik Hatcher writes: >>> Your changes look great in general, though I find some issues: >>> >>> 1) 'stop OR stop AND stop' where stop is a stopword gives a parse >>> error: >>> Encountered "" at line 1, column 0. >>> Was expecting one of: >>> ... >>> ... >> >> I think you must have tried this in a transient state when I forgot >> to >> check in some JavaCC generated files. Try again. This one now >> returns >> an empty BooleanQuery. >> > ok. > I'm a bit puzzled, since I called javacc myself, so generated files > should > not matter, but if it's fixed, I don't care about what went wrong. Let me know if there is still an issue, though I added this exact case to TestPrecedenceQueryParser and its currently working for me. >>> 2) Single term queries using +/- flags are parse to a query without >>> flag >>> +a -> a >> >> Hmmm.... this is a debatable one. It's returning a TermQuery in this >> case for "a". Is that appropriate? Or should it return a >> BooleanQuery >> with a single TermQuery as required? >> > I'd prefer, if query parser parses queries created by query.toString() > to the same query. But that's just a nice to have. It's also an impossibility to have. Here's a simple example, take a Query that is equivalent to A OR B, .toString equals "A B", then parse that with the default operator set to AND and you'll get "+A + B". I created a modified Query->String converter for my current day time project (as I use a String representation for the most recently used drop-down that is stored as a client-side cookie) that explicitly puts in "OR" between SHOULD BooleanClauses. I still believe that we need to have some query-parser-specific way to build strings from Query objects, though I haven't thought through exactly how that should be designed. For example, I'm building a very custom query parser for a client that looks nothing like QueryParser syntax. It would be very nice to be able to turn a Query back around into their expression syntax. >> I think having it optimized to a TermQuery makes the most sense. >> Though, putting it in a BooleanQuery does make this next one >> simpler... >> >>> -a -> a >>> While this doesn't make a difference for +a it's a bit strange for >>> -a, >>> OTOH -a isn't a usable query anyway. >> >> Oops... yeah, you're right. If its a single clause right now it >> doesn't wrap in a BooleanQuery and thus does not take into account the >> modifier +/-/NOT. But as you say, this is a bogus query anyway. I >> guess the right thing to do is wrap both the +a query as above and the >> -a query into a BooleanQuery with the modifier set appropriately. >> > Ok. > The question how to handle BooleanQueries, that contain prohibited > terms > only, is a question on it's own. > In my fix I choose to silently drop these queries. Basically because > it's > effectivly dropped during querying anyway. Silently drop as in you removed them entirely from the resultant Query? That'd be easy enough to add - but is that what we want to happen? Community, thoughts? > In an application, I handled this by dropping the query and notifying > the > user, that some part of the query could not be handled and was ignored. How did your application notice that part of the query was dropped? >>> 3) a OR NOT b parses to 'a -b' which is the same as 'a AND NOT b' >>> IMHO `a OR NOT b' should be `a OR (NOT b)' though lucene cannot >>> search >>> that. Maybe it should raise an error... >> >> Actually it parses like this: >> >> a OR NOT b -> a -b >> a AND NOT b -> +a -b >> >> So they are slightly different, though the effect will be the same. >> >>> a OR NOT b AND c (parsed to a -(+b +c)) should IMHO be parsed to >>> `a >>> (-b +c)' >> >> Ah, ok.... so NOT gets much higher precedence than I'm currently >> giving >> it. That might take me a while to achieve, but I'll give it a shot. >> > Great. I've shifted my local parser grammar around some, and have broken other tests, but do have the NOT precedence working. Here's a testSimple case that I broke by making NOT have higher precedence (I shifted where Modifiers are taken into account - before a Clause now): Query /+term -term term/ yielded /(+term) (-term) term/, expecting /+term -term term/ As you can see this is wrong and I have more work to do. A OR NOT B now parses to A (-B) though, which I too now believe is a more correct (though invalid) interpretation. Erik --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org