Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 67567 invoked from network); 3 May 2010 05:12:41 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 May 2010 05:12:41 -0000 Received: (qmail 64166 invoked by uid 500); 3 May 2010 05:12:39 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 64068 invoked by uid 500); 3 May 2010 05:12:38 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 64060 invoked by uid 99); 3 May 2010 05:12:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 05:12:38 +0000 X-ASF-Spam-Status: No, hits=1.4 required=10.0 tests=AWL,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of adrianocrestani@gmail.com designates 209.85.223.189 as permitted sender) Received: from [209.85.223.189] (HELO mail-iw0-f189.google.com) (209.85.223.189) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 May 2010 05:12:33 +0000 Received: by iwn27 with SMTP id 27so2597267iwn.5 for ; Sun, 02 May 2010 22:12:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type; bh=XX5SobHEa2v+myaBK+rMdgRRp+VAW1XnRf2mJHTL3Vo=; b=TU2Zxd/Q4e3QkFgxsYWITibw1LWvBeFFudRYJ7lUo9WgjganqvW4iiIgRLA4eE2SfK f9BNneTk6XWfItzgGmCCsTIclebM/CDZbwnZlV/mH0suXVPr15OzaMFcxxb7M92fPx4J ew2Amz8a6O4A+uUaevsmBxKT69GjtPy3QGjUg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=lM/2uK20bw8IJ/JpDDfQFyydfGcOvJlmZz89tv7GYpx8bV2xT6MeuWfg3t07fdNFvB KDWtEGbEaKVXzob++9t97xJF9zezmLxPWC4ZsfyPvqhD4GlUND13DOzjPuj31pKX6cwN wza0hVaN65W1QcpCQf/kP46XcFrEpavtNydoA= Received: by 10.231.150.131 with SMTP id y3mr207514ibv.67.1272863532356; Sun, 02 May 2010 22:12:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.60.6 with HTTP; Sun, 2 May 2010 22:11:52 -0700 (PDT) In-Reply-To: References: From: Adriano Crestani Date: Mon, 3 May 2010 01:11:52 -0400 Message-ID: Subject: Re: Questions about the new query parser framework To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0050450140abb1a7c90485a9a28b --0050450140abb1a7c90485a9a28b Content-Type: text/plain; charset=ISO-8859-1 Hi Daniel, 1. Is it intentional that query nodes do not implement equals()? I had rather a lot of overhead when writing unit tests due to being unable to use it - it's either (a) define a Matcher for every single QueryNode class, or (b) toString() it and perform some sanitisation (which is what we're doing.) Good point! QueryNode(s) are data objects, and it makes sense to override their equals method. But before, we need to define what is a QueryNode equality. Should two nodes be considered equal if they represent syntactically or semantically the same query? e.g. an ORQueryNode created from the query will not have the same children ordering as the query , so they are syntactically not equal, but they are semantically equal, because the order of the OR operands (usually) does not matter when the query is executed. I say it usually does not matter, because it's up to the Query object implementation built from that ORQueryNode object, for this reason, I vote for defining that two query nodes should be equals if they are syntactically equal. I also vote for excluding query node tags from the equality check, because they are not meant to represent the query structure, but to attach extra info to the node, which is usually used for communication between processors. 2. Is there a plan to introduce a QuerySyntaxFormatter interface as a counterpart to QuerySyntaxParser, for generating the same query format using the nodes that would have been generated when parsing it (obviously with a small change in format in some situations)? I actually never liked how QueryNode -> query string is done today, using QueryNode.toQueryString(...) method. A QueryNode shouldn't be responsible for converting itself back to the string format, because different SyntaxParser(s) may create, e.g., an ORQueryNode from a or syntax, so what should orQueryNode.toQueryString(...) return? So a QuerySyntaxFormatter makes sense, now we need to start working on how this interface should look like, so SyntaxParser implementors can start implementing equivalent QuerySyntaxFormatter(s). 3. I have been parsing a lot of boolean queries, and have noticed that there is *always* a GroupQueryNode around any BooleanQueryNode. Is this really required, given that BooleanQueryNode is already implicitly a grouping type of query? 4. If GroupQueryNode is specifically a cue to whether the user specified parentheses or not (i.e. if it is supposed to be cosmetic, for the purposes of getting back to what the user typed in) then why is it that "tag:a tag:b" and "tag:(a b)" both parse to the same node structure (making it impossible to figure out which the user actually used)? Yes, it's created when parentheses are defined. The standard query processors needs to know where parentheses were typed, so they can enforce Lucene operator precedence, which is not that trivial and rely on some conditions on whether the user typed or not the parentheses. StandardSyntaxParser generate and different query node trees for these two queries, one with GroupQueryNode and the other without. However, after the query node tree is sent through the StandardQueryNodeProcessorPipeline, the query node tree is optimized and usually GroupQueryNode(s) are removed. Best Regards, Adriano Crestani On Sun, May 2, 2010 at 7:47 PM, Daniel Noll wrote: > Hi all. > > I have been using the new query parser framework fairly heavily, > although our use case is largely for *generating* queries rather than > parsing them - the intermediate query nodes happened to be a very good > model for doing this without all the usual nightmares of thinking > about the escape syntax, and without having to think about how each > query is encoded, which is the usual drawback of using Query objects > directly. > > But I have some questions. > > 1. Is it intentional that query nodes do not implement equals()? I > had rather a lot of overhead when writing unit tests due to being > unable to use it - it's either (a) define a Matcher for every single > QueryNode class, or (b) toString() it and perform some sanitisation > (which is what we're doing.) > > 2. Is there a plan to introduce a QuerySyntaxFormatter interface as > a counterpart to QuerySyntaxParser, for generating the same query > format using the nodes that would have been generated when parsing it > (obviously with a small change in format in some situations)? > > 3. I have been parsing a lot of boolean queries, and have noticed > that there is *always* a GroupQueryNode around any BooleanQueryNode. > Is this really required, given that BooleanQueryNode is already > implicitly a grouping type of query? > > 4. If GroupQueryNode is specifically a cue to whether the user > specified parentheses or not (i.e. if it is supposed to be cosmetic, > for the purposes of getting back to what the user typed in) then why > is it that "tag:a tag:b" and "tag:(a b)" both parse to the same node > structure (making it impossible to figure out which the user actually > used)? > > Daniel > > > > -- > Daniel Noll Forensic and eDiscovery Software > Senior Developer The world's most advanced > Nuix email data analysis > http://nuix.com/ and eDiscovery software > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --0050450140abb1a7c90485a9a28b--