Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: solr-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of solr@elyograg.org designates
 166.70.79.219 as permitted sender)
Message-ID: <53761D46.5030201@elyograg.org>
Date: Fri, 16 May 2014 08:14:30 -0600
From: Shawn Heisey <solr@elyograg.org>
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64;
 rv:24.0) Gecko/20100101 Thunderbird/24.5.0
MIME-Version: 1.0
To: solr-user@lucene.apache.org
Subject: Re: Difference between search strings
References: <1400073330043-4135571.post@n3.nabble.com>
In-Reply-To: <1400073330043-4135571.post@n3.nabble.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

On 5/14/2014 7:15 AM, nativecoder wrote:
> Can someone please tell me the difference between searching a text in the
> following ways
> 
> 1. q=Exact_Word:"samplestring" -> What does it tell to solr  ?
> 
> 2. q=samplestring&qf=Exact_Word -> What does it tell to solr  ?
> 
> 3. q="samplestring"&qf=Exact_Word -> What does it tell to solr  ?
>  
> I think the first and the third one are the same.  is it correct ? How does
> it differ from the second one.
> 
> I am trying to understand how enclosing the full term in "" is resolving the
> solr specific special character problem? What does it tell to solr  ? e.g If
> there is "!" mark in the string solr will identify it as a NOT, "!" is part
> of the string. This issue can be corrected if the full string is enclosed in
> a "". 

Quotes surrounding a Solr query turn it into a phrase query.  For fields
where the entire text is a single token, this becomes an exact match.
For tokenized fields, it means that term positions in the index and the
query will be compared -- so the query terms will need to be next to
each other and in that specific order in the indexed data.

Your first and third examples should parse the same, although the third
one only works with the dismax and edismax parsers.  The first one would
work correctly with the standard parser and the edismax parser, but not
the dismax parser.

Quotes will *also* eliminate the need to escape characters that would
normally require backslash escaping.  For single-token fields where
you're doing exact match, quotes will also preserve spaces in the query.
 If you need an actual quote character to be in your query, it needs to
be escaped.

As for the problem you are having with the exclamation point -- the Solr
analaysis page indicates that KeyWordTokenizer does *not* split on
exclamation points.  The only thing I am aware of that uses exclamation
points for splitting is explicit document routing in SolrCloud.  If the
field you are using is the uniqueKey for your index and you are running
SolrCloud, then text before an exclamation point is used for document
routing.  Note:  You should not use a solr.TextField type for your
uniqueKey field, that should be solr.StrField.  If you use
solr.StrField, then you cannot have an analysis chain with a tokenizer,
so any possible confusion about what KeywordTokenizer does would disappear.

Thanks,
Shawn