lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reth RM <reth.ik...@gmail.com>
Subject Re: Wildcard searches with space in TextField/StrField
Date Wed, 23 Nov 2016 20:08:26 GMT
what is the fieldType of those records?

On Tue, Nov 22, 2016 at 4:18 AM, Sandeep Khanzode <
sandeep_khanzode@yahoo.com.invalid> wrote:

> Hi Erick,
> I gave this a try.
> These are my results. There is a record with "John D. Smith", and another
> named "John Doe".
>
> 1.] {!complexphrase inOrder=true}name:"John D.*" ... does not fetch any
> results.
>
> 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches both results.
>
>
>
> Second observation: There is a record with "John D Smith"
> 1.] {!complexphrase inOrder=true}name:"John*" ... does not fetch any
> results.
>
> 2.] {!complexphrase inOrder=true}name:"John D*" ... fetches that record.
>
> 3.] {!complexphrase inOrder=true}name:"John D S*" ... fetches that record.
>
> SRK
>
>     On Sunday, November 13, 2016 7:43 AM, Erick Erickson <
> erickerickson@gmail.com> wrote:
>
>
>  Right, for that kind of use case you want complexPhraseQueryParser,
> see: https://cwiki.apache.org/confluence/display/solr/Other+
> Parsers#OtherParsers-ComplexPhraseQueryParser
>
> Best,
> Erick
>
> On Sat, Nov 12, 2016 at 9:39 AM, Sandeep Khanzode
> <sandeep_khanzode@yahoo.com> wrote:
> > Thanks, Erick.
> >
> > I am actually not trying to use the String field (prefer a TextField
> here).
> > But, in my comparisons with TextField, it seems that something like
> phrase
> > matching with whitespace and wildcard (like, 'my do*' or say, 'my dog*',
> or
> > say, 'my dog has*') can only be accomplished with a string type field,
> > especially because, with a WhitespaceTokenizer in TextField, the space
> will
> > be lost, and all tokens will be individually considered. Am I missing
> > something?
> >
> > SRK
> >
> >
> > On Friday, November 11, 2016 10:05 PM, Erick Erickson
> > <erickerickson@gmail.com> wrote:
> >
> >
> > You have to query text and string fields differently, that's just the
> > way it works. The problem is getting the query string through the
> > parser as a _single_ token or as multiple tokens.
> >
> > Let's say you have a string field with the "a b" example. You have a
> > single token
> > a b that starts at offset 0.
> >
> > But with a text field, you have two tokens,
> > a at position 0
> > b at position 1
> >
> > But when the query parser sees "a b" (without quotes) it splits it
> > into two tokens, and only the text field has both tokens so the string
> > field won't match.
> >
> > OTOH, when the query parser sees "a\ b" it passes this through as a
> > single token, which only matches the string field as there's no
> > _single_ token "a b" in the text field.
> >
> > But a more interesting question is why you want to search this way.
> > String fields are intended for keywords, machine-generated IDs and the
> > like. They're pretty useless for searching anything except
> > 1> exact tokens
> > 2> prefixes
> >
> > While if you have "my dog has fleas" in a string field, you _can_
> > search "*dog*" and get a hit but the performance is poor when you get
> > a large corpus. Performance for "my*" will be pretty good though.
> >
> > In all this sounds like an XY problem, what's the use-case you're
> > trying to solve?
> >
> > Best,
> > Erick
> >
> >
> >
> > On Thu, Nov 10, 2016 at 10:11 PM, Sandeep Khanzode
> > <sandeep_khanzode@yahoo.com.invalid> wrote:
> >> Hi Erick, Reth,
> >>
> >> The 'a\ b*' as well as the q.op=AND approach worked (successfully) only
> >> for StrField for me.
> >>
> >> Any attempt at creating a 'a\ b*' for a TextField does not match any
> >> documents. The parsedQuery in debug mode does show 'field:a b*'. I am
> sure
> >> there are documents that should match.
> >> Another (maybe unrelated) observation is if I have 'field:a\ b', then
> the
> >> parsedQuery is field:a field:b. Which does not match as expected
> (matches
> >> individually).
> >>
> >> Can you please provide an example that I can use in Solr Query
> dashboard?
> >> That will be helpful.
> >>
> >> I have also seen that wildcard queries work irrespective of field type
> >> i.e. StrField as well as TextField. That makes sense because with a
> >> WhitespaceTokenizer only creates word boundaries when we do not use a
> >> EdgeNGramFilter. If I am not wrong, that is. SRK
> >>
> >>    On Friday, November 11, 2016 5:00 AM, Erick Erickson
> >> <erickerickson@gmail.com> wrote:
> >>
> >>
> >>  You can escape the space with a backslash as  'a\ b*'
> >>
> >> Best,
> >> Erick
> >>
> >> On Thu, Nov 10, 2016 at 2:37 PM, Reth RM <reth.iksam@gmail.com> wrote:
> >>> I don't think you can do wildcard on StrField. For text field, if your
> >>> query is "category:(test m*)"  the parsed query will be  "category:test
> >>> OR
> >>> category:m*"
> >>> You can add q.op=AND to make an AND between those terms.
> >>>
> >>> For phrase type wild card query support, as per docs, it
> >>> is ComplexPhraseQueryParser that supports it. (I haven't tested it
> >>> myself)
> >>>
> >>>
> >>> https://cwiki.apache.org/confluence/display/solr/Other+
> Parsers#OtherParsers-ComplexPhraseQueryParser
> >>>
> >>> On Thu, Nov 10, 2016 at 11:40 AM, Sandeep Khanzode <
> >>> sandeep_khanzode@yahoo.com.invalid> wrote:
> >>>
> >>>> Hi,
> >>>> How does a search like abc* work in StrField. Since the entire thing
> is
> >>>> stored as a single token, is it a type of a trie structure that allows
> >>>> such
> >>>> wildcard matching?
> >>>> How can searches with space like 'a b*' be executed for text fields
> >>>> (tokenized on whitespace)? If we specify this type of query, it is
> >>>> broken
> >>>> down into two queries with field:a and field:b*. I would like them to
> be
> >>>> contiguous, sort of, like a phrase search with wild card.
> >>>> SRK
> >>
> >>
> >>
> >
> >
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message