Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 89522 invoked from network); 28 Apr 2009 05:54:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Apr 2009 05:54:55 -0000 Received: (qmail 60870 invoked by uid 500); 28 Apr 2009 05:54:54 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 60793 invoked by uid 500); 28 Apr 2009 05:54:54 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 60785 invoked by uid 99); 28 Apr 2009 05:54:54 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Apr 2009 05:54:54 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Apr 2009 05:54:52 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 5911E234C003 for ; Mon, 27 Apr 2009 22:54:30 -0700 (PDT) Message-ID: <1358779345.1240898070349.JavaMail.jira@brutus> Date: Mon, 27 Apr 2009 22:54:30 -0700 (PDT) From: "Bertrand Delacretaz (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Commented: (LUCENE-1567) New flexible query parser In-Reply-To: <1213324183.1237435190562.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703504#action_12703504 ] Bertrand Delacretaz commented on LUCENE-1567: --------------------------------------------- Grant, the ip-clearance document that you created under incubator-public in svn had not been added to the site-publish folder, I just did that in revision 769253. If that's not correct, please remove both xml and html versions of the lucene-query-parser file there. > New flexible query parser > ------------------------- > > Key: LUCENE-1567 > URL: https://issues.apache.org/jira/browse/LUCENE-1567 > Project: Lucene - Java > Issue Type: New Feature > Components: QueryParser > Environment: N/A > Reporter: Luis Alves > Assignee: Grant Ingersoll > Attachments: lucene_trunk_FlexQueryParser_2009March24.patch, lucene_trunk_FlexQueryParser_2009March26_v3.patch > > > From "New flexible query parser" thread by Micheal Busch > in my team at IBM we have used a different query parser than Lucene's in > our products for quite a while. Recently we spent a significant amount > of time in refactoring the code and designing a very generic > architecture, so that this query parser can be easily used for different > products with varying query syntaxes. > This work was originally driven by Andreas Neumann (who, however, left > our team); most of the code was written by Luis Alves, who has been a > bit active in Lucene in the past, and Adriano Campos, who joined our > team at IBM half a year ago. Adriano is Apache committer and PMC member > on the Tuscany project and getting familiar with Lucene now too. > We think this code is much more flexible and extensible than the current > Lucene query parser, and would therefore like to contribute it to > Lucene. I'd like to give a very brief architecture overview here, > Adriano and Luis can then answer more detailed questions as they're much > more familiar with the code than I am. > The goal was it to separate syntax and semantics of a query. E.g. 'a AND > b', '+a +b', 'AND(a,b)' could be different syntaxes for the same query. > We distinguish the semantics of the different query components, e.g. > whether and how to tokenize/lemmatize/normalize the different terms or > which Query objects to create for the terms. We wanted to be able to > write a parser with a new syntax, while reusing the underlying > semantics, as quickly as possible. > In fact, Adriano is currently working on a 100% Lucene-syntax compatible > implementation to make it easy for people who are using Lucene's query > parser to switch. > The query parser has three layers and its core is what we call the > QueryNodeTree. It is a tree that initially represents the syntax of the > original query, e.g. for 'a AND b': > AND > / \ > A B > The three layers are: > 1. QueryParser > 2. QueryNodeProcessor > 3. QueryBuilder > 1. The upper layer is the parsing layer which simply transforms the > query text string into a QueryNodeTree. Currently our implementations of > this layer use javacc. > 2. The query node processors do most of the work. It is in fact a > configurable chain of processors. Each processors can walk the tree and > modify nodes or even the tree's structure. That makes it possible to > e.g. do query optimization before the query is executed or to tokenize > terms. > 3. The third layer is also a configurable chain of builders, which > transform the QueryNodeTree into Lucene Query objects. > Furthermore the query parser uses flexible configuration objects, which > are based on AttributeSource/Attribute. It also uses message classes that > allow to attach resource bundles. This makes it possible to translate > messages, which is an important feature of a query parser. > This design allows us to develop different query syntaxes very quickly. > Adriano wrote the Lucene-compatible syntax in a matter of hours, and the > underlying processors and builders in a few days. We now have a 100% > compatible Lucene query parser, which means the syntax is identical and > all query parser test cases pass on the new one too using a wrapper. > Recent posts show that there is demand for query syntax improvements, > e.g improved range query syntax or operator precedence. There are > already different QP implementations in Lucene+contrib, however I think > we did not keep them all up to date and in sync. This is not too > surprising, because usually when fixes and changes are made to the main > query parser, people don't make the corresponding changes in the contrib > parsers. (I'm guilty here too) > With this new architecture it will be much easier to maintain different > query syntaxes, as the actual code for the first layer is not very much. > All syntaxes would benefit from patches and improvements we make to the > underlying layers, which will make supporting different syntaxes much > more manageable. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org