lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandeep Khanzode <sandeep_khanz...@yahoo.com.INVALID>
Subject Re: Wildcard searches with space in TextField/StrField
Date Thu, 24 Nov 2016 12:26:33 GMT
Hi,
This is the typical TextField with ...   <fieldType name="text123" class="solr.TextField"
positionIncrementGap="100">    <analyzer>      <tokenizer class="solr.StandardTokenizerFactory"/> 
    <filter class="solr.LowerCaseFilterFactory"/>    </analyzer>  </fieldType>



SRK 

    On Thursday, November 24, 2016 1:38 AM, Reth RM <reth.iksam@gmail.com> wrote:
 

 what is the fieldType of those records?  
On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode <sandeep_khanzode@yahoo.com.invalid>
wrote:

Hi Erick,
I gave this a try. 
These are my results. There is a record with "John D. Smith", and another named "John Doe".

1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results. 



Second observation: There is a record with "John D Smith"
1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any results. 

2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record. 

3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record. 

SRK

    On Sunday, November 13, 2016 7:43 AM, Erick Erickson <erickerickson@gmail.com>
wrote:


 Right, for that kind of use case you want complexPhraseQueryParser,
see: https://cwiki.apache.org/ confluence/display/solr/Other+ Parsers#OtherParsers- ComplexPhraseQueryParser

Best,
Erick

On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode
<sandeep_khanzode@yahoo.com> wrote:
> Thanks, Erick.
>
> I am actually not trying to use the String field (prefer a TextField here).
> But, in my comparisons with TextField, it seems that something like phrase
> matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*', or
> say, 'my dog has*') can only be accomplished with a string type field,
> especially because, with a WhitespaceTokenizer in TextField, the space will
> be lost, and all tokens will be individually considered. Am I missing
> something?
>
> SRK
>
>
> On Friday, November 11, 2016 10:05 PM, Erick Erickson
> <erickerickson@gmail.com> wrote:
>
>
> You have to query text and string fields differently, that's just the
> way it works. The problem is getting the query string through the
> parser as a _single_ token or as multiple tokens.
>
> Let's say you have a string field with the "a b" example. You have a
> single token
> a b that starts at offset 0.
>
> But with a text field, you have two tokens,
> a at position 0
> b at position 1
>
> But when the query parser sees "a b" (without quotes) it splits it
> into two tokens, and only the text field has both tokens so the string
> field won't match.
>
> OTOH, when the query parser sees "a\ b" it passes this through as a
> single token, which only matches the string field as there's no
> _single_ token "a b" in the text field.
>
> But a more interesting question is why you want to search this way.
> String fields are intended for keywords, machine-generated IDs and the
> like. They're pretty useless for searching anything except
> 1> exact tokens
> 2> prefixes
>
> While if you have "my dog has fleas" in a string field, you _can_
> search "*dog*" and get a hit but the performance is poor when you get
> a large corpus. Performance for "my*" will be pretty good though.
>
> In all this sounds like an XY problem, what's the use-case you're
> trying to solve?
>
> Best,
> Erick
>
>
>
> On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
> <sandeep_khanzode@yahoo.com. invalid> wrote:
>> Hi Erick, Reth,
>>
>> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only
>> for StrField for me.
>>
>> Any attempt at creating a 'a\ b*' for a TextField does not match any
>> documents. The parsedQuery in debug mode does show 'field:a b*'. I am sure
>> there are documents that should match.
>> Another (maybe unrelated) observation is if I have 'field:a\ b', then the
>> parsedQuery is field:a field:b. Which does not match as expected (matches
>> individually).
>>
>> Can you please provide an example that I can use in Solr Query dashboard?
>> That will be helpful.
>>
>> I have also seen that wildcard queries work irrespective of field type
>> i.e. StrField as well as TextField. That makes sense because with a
>> WhitespaceTokenizer only creates word boundaries when we do not use a
>> EdgeNGramFilter. If I am not wrong, that is. SRK
>>
>>    On Friday, November 11, 2016 5:00 AM, Erick Erickson
>> <erickerickson@gmail.com> wrote:
>>
>>
>>  You can escape the space with a backslash as  'a\ b*'
>>
>> Best,
>> Erick
>>
>> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM <reth.iksam@gmail.com> wrote:
>>> I don't think you can do wildcard on StrField. For text field, if your
>>> query is "category:(test m*)"  the parsed query will be  "category:test
>>> OR
>>> category:m*"
>>> You can add q.op=AND to make an AND between those terms.
>>>
>>> For phrase type wild card query support, as per docs, it
>>> is ComplexPhraseQueryParser that supports it. (I haven't tested it
>>> myself)
>>>
>>>
>>> https://cwiki.apache.org/ confluence/display/solr/Other+ Parsers#OtherParsers-
ComplexPhraseQueryParser
>>>
>>> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode <
>>> sandeep_khanzode@yahoo.com. invalid> wrote:
>>>
>>>> Hi,
>>>> How does a search like abc* work in StrField. Since the entire thing is
>>>> stored as a single token, is it a type of a trie structure that allows
>>>> such
>>>> wildcard matching?
>>>> How can searches with space like 'a b*' be executed for text fields
>>>> (tokenized on whitespace)? If we specify this type of query, it is
>>>> broken
>>>> down into two queries with field:a and field:b*. I would like them to be
>>>> contiguous, sort of, like a phrase search with wild card.
>>>> SRK
>>
>>
>>
>
>


   



   
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message