Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 90043 invoked from network); 7 Feb 2008 19:13:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Feb 2008 19:13:02 -0000 Received: (qmail 69116 invoked by uid 500); 7 Feb 2008 19:12:52 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 69088 invoked by uid 500); 7 Feb 2008 19:12:52 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 69076 invoked by uid 99); 7 Feb 2008 19:12:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Feb 2008 11:12:52 -0800 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of wunderwood@netflix.com designates 208.75.77.145 as permitted sender) Received: from [208.75.77.145] (HELO mx2.netflix.com) (208.75.77.145) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Feb 2008 19:12:37 +0000 Received: from message.netflix.com (exchangeav [10.64.32.68]) by mx2.netflix.com (8.12.11.20060308/8.12.11) with ESMTP id m17JHks4001893 for ; Thu, 7 Feb 2008 11:17:47 -0800 Received: from Superfly.netflix.com ([10.64.32.70]) by message.netflix.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 7 Feb 2008 11:12:26 -0800 Received: from 10.2.164.72 ([10.2.164.72]) by superfly.netflix.com ([10.64.32.70]) with Microsoft Exchange Server HTTP-DAV ; Thu, 7 Feb 2008 19:12:26 +0000 User-Agent: Microsoft-Entourage/11.3.6.070618 Date: Thu, 07 Feb 2008 11:12:42 -0800 Subject: Re: Query with literal quote character: 6'2" From: Walter Underwood To: Message-ID: Thread-Topic: Query with literal quote character: 6'2" Thread-Index: AchpvWmDqE7leNWwEdy/WQAUUTF+rA== In-Reply-To: X-Face: 7Vqnb4fOVKsO)3JuUXKxR\M]:e"u'eG`Zue*.((7i7%P%rvZgS[j~95@C-s3i Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-OriginalArrivalTime: 07 Feb 2008 19:12:26.0042 (UTC) FILETIME=[600031A0:01C869BD] X-Brightmail-Tracker: AAAAAQAAA+k= X-Language-Identified: TRUE X-Virus-Checked: Checked by ClamAV on apache.org How about the query parser respecting backslash escaping? I need free-text input, no syntax at all. Right now, I'm escaping every Lucene special character in the front end. I just figured out that it breaks for colon, can't search for "12:01" with "12\:01". wunder On 2/7/08 11:06 AM, "Chris Hostetter" wrote: > > : I confirmed this behavior in trunk with the following query: > : > http://localhost:8983/solr/select?qt=dismax&q=6'2"&debugQuery=on&qf=cat&pf=cat > : > : The result is that the double quote is dropped: > : +DisjunctionMaxQuery((cat:6'2)~0.01) DisjunctionMaxQuery((cat:6'2)~0.01) > : > : This seems like it's a bug (rather than by design), but I could be > : wrong... Hoss? > > It was by design ... but it could be handled better. the idea is that if > the input has balanced quotes (ie: an even number) then leave them alone > so they are dealt with as phrase delimiters. If there is an uneven number > strip them out since we don't know wether they are a mistake (ie: unclosed > phrase) or intended to be literal. > > auto-escaping them probably would have been a better way to go (ie: let > the analyzer decide wether or not to strip them) ... i'm not sure why i > didn't do that in the first place (I think at the time the lucene > QueryParser didn't deal with escaped quotes very well) > > the thing to keep in mind, is that even if it did escape them, this still > wouldn't work if the user input were... > > the 6'2" man dating the 5'3" woman > > ...because it would assume the even number of double-quote characters mean > that " man dating the 5'3" is a phrase. i remember spending a day > going over query loks trying tp figure out a good set of hueristic rules > for guessing when quote characters in user input should be interpreted as > phrase delims vs "inch" markers before a coworker smacked me and made me > realize it was a fairly intractable problem and simple rules would be > easier to understand anyway. > > FYI: this is all happening in > SolrPluginUtils.stripUnbalancedQuotes(CharSequence) which > DisMax(RequestHanler) calls before passing the string to > DisjunctionMaxQueryParser. > > > > -Hoss >