lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep Khanzode <>
Subject Re: Wildcard searches with space in TextField/StrField
Date Thu, 24 Nov 2016 12:26:33 GMT
This is the typical TextField with ...   <fieldType name="text123" class="solr.TextField"
positionIncrementGap="100">    <analyzer>      <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.LowerCaseFilterFactory"/>    </analyzer>  </fieldType>


    On Thursday, November 24, 2016 1:38 AM, Reth RM <> wrote:

 what is the fieldType of those records?  
On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode <>

Hi Erick,
I gave this a try. 
These are my results. There is a record with "John D. Smith", and another named "John Doe".

1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. 

Second observation: There is a record with "John D Smith"
1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. 

3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. 


    On Sunday, November 13, 2016 7:43 AM, Erick Erickson <>

 Right, for that kind of use case you want complexPhraseQueryParser,
see: confluence/display/solr/Other+ Parsers#OtherParsers- ComplexPhraseQueryParser


On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode
<> wrote:
> Thanks, Erick.
> I am actually not trying to use the String field (prefer a TextField here).
> But, in my comparisons with TextField, it seems that something like phrase
> matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or
> say, 'my dog has*') can only be accomplished with a string type field,
> especially because, with a WhitespaceTokenizer in TextField, the space will
> be lost, and all tokens will be individually considered. Am I missing
> something?
> On Friday, November 11, 2016 10:05 PM, Erick Erickson
> <> wrote:
> You have to query text and string fields differently, that's just the
> way it works. The problem is getting the query string through the
> parser as a _single_ token or as multiple tokens.
> Let's say you have a string field with the "a b" example. You have a
> single token
> a b that starts at offset 0.
> But with a text field, you have two tokens,
> a at position 0
> b at position 1
> But when the query parser sees "a b" (without quotes) it splits it
> into two tokens, and only the text field has both tokens so the string
> field won't match.
> OTOH, when the query parser sees "a\ b" it passes this through as a
> single token, which only matches the string field as there's no
> _single_ token "a b" in the text field.
> But a more interesting question is why you want to search this way.
> String fields are intended for keywords, machine-generated IDs and the
> like. They're pretty useless for searching anything except
> 1> exact tokens
> 2> prefixes
> While if you have "my dog has fleas" in a string field, you _can_
> search "*dog*" and get a hit but the performance is poor when you get
> a large corpus. Performance for "my*" will be pretty good though.
> In all this sounds like an XY problem, what's the use-case you're
> trying to solve?
> Best,
> Erick
> On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
> < invalid> wrote:
>> Hi Erick, Reth,
>> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only
>> for StrField for me.
>> Any attempt at creating a 'a\ b*' for a TextField does not match any
>> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure
>> there are documents that should match.
>> Another (maybe unrelated) observation is if I have 'field:a\ b', then the
>> parsedQuery is field:a field:b. Which does not match as expected (matches
>> individually).
>> Can you please provide an example that I can use in Solr Query dashboard?
>> That will be helpful.
>> I have also seen that wildcard queries work irrespective of field type
>> i.e. StrField as well as TextField. That makes sense because with a
>> WhitespaceTokenizer only creates word boundaries when we do not use a
>> EdgeNGramFilter. If I am not wrong, that is. SRK
>>    On Friday, November 11, 2016 5:00 AM, Erick Erickson
>> <> wrote:
>>  You can escape the space with a backslash as  'a\ b*'
>> Best,
>> Erick
>> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM <> wrote:
>>> I don't think you can do wildcard on StrField. For text field, if your
>>> query is "category:(test m*)"  the parsed query will be  "category:test
>>> OR
>>> category:m*"
>>> You can add q.op=AND to make an AND between those terms.
>>> For phrase type wild card query support, as per docs, it
>>> is ComplexPhraseQueryParser that supports it. (I haven't tested it
>>> myself)
>>> confluence/display/solr/Other+ Parsers#OtherParsers-
>>> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode <
>>> invalid> wrote:
>>>> Hi,
>>>> How does a search like abc* work in StrField. Since the entire thing is
>>>> stored as a single token, is it a type of a trie structure that allows
>>>> such
>>>> wildcard matching?
>>>> How can searches with space like 'a b*' be executed for text fields
>>>> (tokenized on whitespace)? If we specify this type of query, it is
>>>> broken
>>>> down into two queries with field:a and field:b*. I would like them to be
>>>> contiguous, sort of, like a phrase search with wild card.
>>>> SRK


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message