From dev-return-84516-apmail-lucene-dev-archive=lucene.apache.org@lucene.apache.org Thu Dec 1 17:51:36 2011 Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4176477A7 for ; Thu, 1 Dec 2011 17:51:36 +0000 (UTC) Received: (qmail 37138 invoked by uid 500); 1 Dec 2011 17:51:34 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 37065 invoked by uid 500); 1 Dec 2011 17:51:34 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 37058 invoked by uid 99); 1 Dec 2011 17:51:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2011 17:51:34 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [141.213.143.161] (HELO its-embx-01.adsroot.itcs.umich.edu) (141.213.143.161) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Dec 2011 17:51:29 +0000 Received: from ITS-EMBX-03.adsroot.itcs.umich.edu ([169.254.3.71]) by its-embx-01.adsroot.itcs.umich.edu ([169.254.1.232]) with mapi id 14.01.0289.008; Thu, 1 Dec 2011 12:51:06 -0500 From: "Burton-West, Tom" To: "dev@lucene.apache.org" CC: "Dueber, William" , "Farber, Phillip" Subject: re: LUCENE-167 and Solr default handling of Boolean operators is broken Thread-Topic: re: LUCENE-167 and Solr default handling of Boolean operators is broken Thread-Index: AcywUcyg/DHZtU5fS8ibZ3xIJYJHxw== Date: Thu, 1 Dec 2011 17:51:06 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [141.211.43.191] Content-Type: multipart/alternative; boundary="_000_C0551C512C863540BC59694A118452AAF8A33CITSEMBX03adsrooti_" MIME-Version: 1.0 --_000_C0551C512C863540BC59694A118452AAF8A33CITSEMBX03adsrooti_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable The default query parser in Solr does not handle precedence of Boolean oper= ators in the way most people expect. "A AND B OR C" gets interpreted as "A AND (B OR C)" . There are numerous ot= her examples in the JIRA ticket for Lucene 167, this article on the wiki ht= tp://wiki.apache.org/lucene-java/BooleanQuerySyntax and in this blog post: = http://robotlibrarian.billdueber.com/solr-and-boolean-operators/ This issue was reported in 2003 but the fix does not seem to have made it i= nto the default query parser for either Lucene or Solr It appears that Lucene 167 was closed in 2009 based on the assumption that = the query parser in Lucene 1823 would become the default Lucene query parse= r. However 1823 seems to have gotten bogged down and is not yet resolved. = I do see that there is a precedence query parser in LUCENE-1937 which was= committed to contrib. in the 3x branch:(http://svn.apache.org/viewvc/luce= ne/dev/branches/branch_3x/lucene/contrib/queryparser/src/java/org/apache/lu= cene/queryParser/precedence/package.html?view=3Dco) Would it be possible to use the contrib 3x precedence query parser in Solr= ? Would this require modifying the LuceneQParserPlugin and if so would it mak= e sense to open a JIRA issue? Are there any plans to make the precedence query parser the default for eit= her Lucene or Solr? If not, are there any plans to make it more prominent in the documentation = that the default Lucene query parser has issues with precedence? A bit more background below Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search ---------------------------------------------------- More Background There were some concerns about breaking backward compatibility but in a mai= ling list post in 2005 Yonik Sealy said: "The current behavior is so surprising that I doubt that no one is relying on it." (http://www.mail-archive.com/java-user@lucene.apache.org/m= sg00018.html) and Doug Cutting said "+1. Fixing operator precedence seems to me like an = acceptable incompatibility. The change needs to be well documented in relea= se notes, and the old QueryParser should be available, deprecated, for a ti= me for back-compatibility." (http://www.mail-archive.com/java-user@lucene.apache.org/msg00037.html) --_000_C0551C512C863540BC59694A118452AAF8A33CITSEMBX03adsrooti_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
The default query parser in Solr does not handle precedence of Boolean= operators in the way most people expect.
 
“A AND B OR C” gets interpreted as “A AND (B OR C)&#= 8221; . There are numerous other examples in the JIRA ticket for Lucene 167= , this article on the wiki http://wiki.apache.org/lucene-jav= a/BooleanQuerySyntax and in this blog post: http://robotlibrarian.bill= dueber.com/solr-and-boolean-operators/
 
This issue was reported in 2003 but the fix does not seem to have made= it into the default query parser for either Lucene or Solr
 
It appears that Lucene 167 was closed in 2009 based on the assumption = that the query parser in Lucene 1823 would become the default Lucene query = parser.  However 1823 seems to have gotten bogged down and is not yet = resolved.  I do see that there is a precedence query parser in LUCENE-1937 which was committed to contrib. in  the 3= x branch:(http://svn.apache.org/viewvc/lucene/dev/bran= ches/branch_3x/lucene/contrib/queryparser/src/java/org/apache/lucene/queryP= arser/precedence/package.html?view=3Dco)
 
Would it be possible to use the contrib 3x precedence query parser in= Solr? 
Would this require modifying the LuceneQParserPlugin and if so would i= t make sense to open a JIRA issue?
 
Are there any plans to make the precedence query parser the default fo= r either Lucene or Solr?
 
If not, are there any plans to make it more prominent in the documenta= tion that the default Lucene query parser has issues with precedence?
 
 
A bit more background below
 
Tom Burton-West
-----------------------------------------------= -----
 
More Background
 
There were some concerns about breaking backward compatibility but in = a mailing list post in 2005  Yonik Sealy said:
= “The current behavior is so surprising that I doubt  that no one= is
 
and Doug Cutting said  “+1. Fixing operator precedence seems= to me like an acceptable incompatibility. The change needs to be well docu= mented in release notes, and the old QueryParser should be available, deprecated, for a time for back-compatibility.”<= /font>
 
 
 
--_000_C0551C512C863540BC59694A118452AAF8A33CITSEMBX03adsrooti_--