lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <wunderw...@netflix.com>
Subject Re: Query with literal quote character: 6'2"
Date Thu, 07 Feb 2008 19:12:42 GMT
How about the query parser respecting backslash escaping? I need
free-text input, no syntax at all. Right now, I'm escaping every
Lucene special character in the front end. I just figured out that
it breaks for colon, can't search for "12:01" with "12\:01".

wunder

On 2/7/08 11:06 AM, "Chris Hostetter" <hossman_lucene@fucit.org> wrote:

> 
> : I confirmed this behavior in trunk with the following query:
> : 
> http://localhost:8983/solr/select?qt=dismax&q=6'2"&debugQuery=on&qf=cat&pf=cat
> : 
> : The result is that the double quote is dropped:
> : +DisjunctionMaxQuery((cat:6'2)~0.01) DisjunctionMaxQuery((cat:6'2)~0.01)
> : 
> : This seems like it's a bug (rather than by design), but I could be
> : wrong... Hoss?
> 
> It was by design ... but it could be handled better.  the idea is that if
> the input has balanced quotes (ie: an even number) then leave them alone
> so they are dealt with as phrase delimiters.  If there is an uneven number
> strip them out since we don't know wether they are a mistake (ie: unclosed
> phrase) or intended to be literal.
> 
> auto-escaping them probably would have been a better way to go (ie: let
> the analyzer decide wether or not to strip them) ... i'm not sure why i
> didn't do that in the first place (I think at the time the lucene
> QueryParser didn't deal with escaped quotes very well)
> 
> the thing to keep in mind, is that even if it did escape them, this still
> wouldn't work if the user input were...
> 
>              the 6'2" man dating the 5'3" woman
> 
> ...because it would assume the even number of double-quote characters mean
> that   " man dating the 5'3"  is a phrase.  i remember spending a day
> going over query loks trying tp figure out a good set of hueristic rules
> for guessing when quote characters in user input should be interpreted as
> phrase delims vs "inch" markers before a coworker smacked me and made me
> realize it was a fairly intractable problem and simple rules would be
> easier to understand anyway.
> 
> FYI: this is all happening in
> SolrPluginUtils.stripUnbalancedQuotes(CharSequence) which
> DisMax(RequestHanler) calls before passing the string to
> DisjunctionMaxQueryParser.
> 
> 
> 
> -Hoss
> 


Mime
View raw message