Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 3144 invoked from network); 12 Aug 2009 06:52:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Aug 2009 06:52:21 -0000 Received: (qmail 38829 invoked by uid 500); 12 Aug 2009 06:52:27 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 38753 invoked by uid 500); 12 Aug 2009 06:52:27 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 38745 invoked by uid 99); 12 Aug 2009 06:52:27 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Aug 2009 06:52:27 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of buschmic@gmail.com designates 209.85.222.186 as permitted sender) Received: from [209.85.222.186] (HELO mail-pz0-f186.google.com) (209.85.222.186) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Aug 2009 06:52:17 +0000 Received: by pzk16 with SMTP id 16so3950303pzk.20 for ; Tue, 11 Aug 2009 23:51:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding; bh=shq7vq8mEf1dyOxINBwLt4/J7idfhgUNQW2gSl5S944=; b=xBLGHymj8ej3774myJ0q+vkCKQRe3LCapjq58Jn/y1SrV1B2OJ5DqMA/r0Yfx6uOak HcWgh5Sj9QXMEZnPO4ftpV6Lwe7+2c8g+6yrVxTmm9mAeqHPl5Ya6Uy2hAXXcGqjgbUf cY5HOgl7Rqo0Z/EjUNAxht2xY7BryVKfDqwZ0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; b=QaoPkuDoGicgCI92CRxQBRcvSF9CplqCPkIPbKsp1UR9i8oGVp28CTadonem0C8Lzr igLvnSlOym8jp0a1JGjmth/OmOHUPhq6i1eqZlsdCWroiu/vBuarZk6/miJlKk0mrm60 /m0QZjn32sDeoq/ZC98XFHd8Q2xMwzJza5QQI= Received: by 10.114.15.7 with SMTP id 7mr4480755wao.200.1250059916788; Tue, 11 Aug 2009 23:51:56 -0700 (PDT) Received: from michael-buschs-macbook-pro-2.local (c-76-102-12-216.hsd1.ca.comcast.net [76.102.12.216]) by mx.google.com with ESMTPS id n9sm12069672wag.23.2009.08.11.23.51.55 (version=TLSv1/SSLv3 cipher=RC4-MD5); Tue, 11 Aug 2009 23:51:55 -0700 (PDT) Message-ID: <4A8266EF.6020605@gmail.com> Date: Tue, 11 Aug 2009 23:53:35 -0700 From: Michael Busch User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.1) Gecko/20090715 Thunderbird/3.0b3 MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Re: The new Contrib QueryParser should not be slated to replace the old one yet References: <4A81B04D.3010808@gmail.com> <85d3c3b60908112244m1a2e7e44m7bd9b67472c0c01c@mail.gmail.com> In-Reply-To: <85d3c3b60908112244m1a2e7e44m7bd9b67472c0c01c@mail.gmail.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org Thanks, Jason! Glad the new QP is useful for you. I'd like to explain a bit the (IBM internal) history of this new QP: A few years ago we wanted to change/extend Lucene's query syntax. We did similar things as you mention, like no stemming for quoted terms, additional syntax features, etc. Very soon we found out that it wasn't possible to extend the current QP without changing the javacc code. We maintained our own copy of the QP for a long time, which had several thousand lines of code. Fixes in Lucene's QP were hard to merge into our copy sometimes. Also, everyone who worked on it had to have javacc skills. After managing our own copy for almost two years we decided to develop a new QP framework that allows to change/extend different parts of the QP individually. We had the goal of keeping the javacc part as small as possible and separated from the rest. That's the reason for the first layer of the new QP (SyntaxParser). The second goal was to isolate building Query objects. This gives us the flexibility of quickly changing which Query objects to instantiate (QueryBuilders). Only the QueryBuilders need to have dependencies on Lucene. The result is that we can share most of the QP code with another team internally, that doesn't even use Lucene. They can simply switch the QueryBuilder layer to build their own Query objects and can use most of the other parts. That leaves semantic and linguistic features to the middle layer: QueryProcessors. We have several of them, they do the dirty work, and we share them internally across teams as well. So IBM internally sharing the QP code was successful so far - three separate products are using it. That was the reason we thought that it would be useful for Lucene as well. There are currently different QP implementations in contrib. We thought it would be nice to switch them all to the same framework and share as much code as possible between them. This has the big advantage of increasing maintainability. It will also be much easier to handle query syntax backwards-compatibility. E.g. if we want to change the RangeQuery syntax from [], {} to <=, >= (a change Lucene users asked for), we could simply create a new SyntaxParser implementation. Users would have to change one line of code to switch to the new syntax (instantiating the new SyntaxParser). Other users, who need to keep the old syntax, or maybe have a lot of saved queries with unescaped '<' and '=' characters, could keep the old SyntaxParser. How would you do that with the current QP without copying the entire thing? How would we maintain the copies? Of course it would have been preferable to develop the whole thing transparently in public. However, it takes time to get approvals to open source code, so we decided to continue with the implementation in parallel to save time. We thought the easiest way to help users switching over to the new QP would be to create an implementation that behaves 100% like the current QP. So Luis and Adriano worked hard on creating a new, 100% Lucene-compatible implementation, with a wrapper class that allows using the new QP exactly like the old one and even running all old unit tests. The major concern about the new QP now is its complexity. I don't disagree: the learning curve of a component that has dozens of classes is higher compared to a single class. Luis and Adriano did the implementation in this very structured way based on the experience they gathered internally in the past. Of course we could have implemented everything less generic in fewer classes. Maybe that is better for Lucene. We can still change and improve that - and those are the discussions we have to have now. Discuss which abstraction make sense, where we can condense the code. For these discussions it would be helpful to have an understanding of what the difference between core classes of the framework (not very many!) and the Lucene compatibility implementation are. We should also realize that - thanks to Luis and Adriano - we now have actual code that can be the basis of discussions and that we can take and improve. No matter if this new QP is going to replace the old one or not, I'm very thankful that the two went through the effort of creating it. This framework has been very successful internally and we wanted to share something good with the Lucene community. Michael On 8/11/09 10:44 PM, Jason Rutherglen wrote: > I'm starting to use the new parser to emulate Google's queries > (i.e. a phrase query with a single term means no-stemming, > something the current QP doesn't allow because it converts the > quoted query into a term query inside the JavaCC portion). It's > been very straightforward and logical to use (so far). > > Thanks to the contrib query parser team! > > On Tue, Aug 11, 2009 at 10:54 AM, Mark Miller wrote: > >> I don't think we should stick with the current path of replacing the current >> QueryParser with the new contrib QueryParser in Lucene 3.0. >> >> The new QueryParser has not been used much at all yet. Its interfaces (which >> will need to abide by back compat in core) have not been vetted enough. >> >> The new parser appears to add complication to some of things that were very >> simple with the old parser. >> >> The main benefits of the new parser are claimed to be the ability to plug >> and play many syntaxes and QueryBuilders. This is not an end user benefit >> though and I'm not even sure how much of a benefit it is to us. There is >> currently only one impl. It seems to me, once you start another impl, its a >> long shot that the exact same query tree representation is going to work >> with a completely different syntax. Sure, if you are just doing postfix >> rather than prefix, it will be fine � but the stuff that would likely be >> done � actual new syntaxes � are not likely to be very pluggable. If a >> syntax can map to the same query tree, I think we would likely stick to a >> single syntax � else suffer the confusion and maintenance headaches for >> syntactic sugar. More than a well factored QueryParser that can more easily >> allow different syntaxes to map to the same query tree representation, I >> think we just want a single solid syntax for core Lucene that supports Spans >> to some degree. We basically have that now, sans the spans support. Other, >> more exotic QueryParsers should live in contrib, as they do now. >> >> Which isn't to say this QueryParser should not one day rule the roost � but >> I don't think its earned the right yet. And I don't think there is a hurry >> to toss the old parser. >> >> Personally, I think that the old parser should not be deprecated. Lets let >> the new parser breath in contrib for a bit. Lets see if anyone actually adds >> any other syntaxes. Lets see if the pluggability results in any >> improvements. Lets see if some of the harder things to do (overriding query >> build methods?) become easier or keep people from using the new parser. >> >> Lets just see if the new parser draws users without us forcing them to it. >> And lets also wait and see what other committers say � not many have gotten >> much time to deal with the new parser, or deal with user list questions on >> it. >> >> I just think its premature to start moving people to this new parser. It >> didn't even really get in until right before release � the paint on the >> thing still reeks. There is no rush. I saw we undeprecate the current >> QueryParser and remove the wording in the new QueryParser about it replacing >> the new in 3.0. Later, if we think it should replace it (after having some >> experience to judge from), we can reinstate the current plan. Anyone agree? >> >> -- >> - Mark >> >> http://www.lucidimagination.com >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-dev-help@lucene.apache.org >> >> >> > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org