lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <m...@apache.org>
Subject Re: simple matches not catching at query time
Date Tue, 11 Apr 2017 20:37:46 GMT
John,

Here I mean a query, which matches a doc, which it expected to be matched
by the problem query.
https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-TheexplainOtherParameter

On Tue, Apr 11, 2017 at 11:32 PM, John Blythe <john@curvolabs.com> wrote:

> first off, i don't think i have a full handle on the import of what is
> outputted by the debugger.
>
> that said, if "...PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"
> is
> matching against `vendor_coolmed | coolmed | vendor`, then 'vendor' should
> match. the query analyzer is keywordtokenizer, pattern replacement
> (replaces all non-alphanumeric with underscores), checks for synonyms (the
> underscores are my way around the multi term synonym problem), then
> worddelimiter is used to blow out the underscores and generate word parts
> ("vendor_vendor" => 'vendor' 'vendor'), stop filter, lower case, stem.
>
> in your mentioned strategy, what is the "id:<expected>" representative of?
>
> thanks!
>
> --
> *John Blythe*
> Product Manager & Lead Developer
>
> 251.605.3071 | john@curvolabs.com
> www.curvolabs.com
>
> 58 Adams Ave
> Evansville, IN 47713
>
> On Tue, Apr 11, 2017 at 4:12 PM, Mikhail Khludnev <mkhl@apache.org> wrote:
>
> > John,
> >
> > How do you suppose to match any of "parsed_filter_queries":["
> > MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
> > vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"
> > against
> > vendor_coolmed | coolmed | vendor ?
> >
> > I just can't see any chance to match them.
> >
> > One possible strategy is pick the simplest filter query, put it as a main
> > query.
> > Then pass &expainOther=id:<expected> and share the explanation.
> >
> >
> >
> > On Tue, Apr 11, 2017 at 8:57 PM, John Blythe <john@curvolabs.com> wrote:
> >
> > > hi, erick.
> > >
> > > appreciate the feedback.
> > >
> > > 1> i'm sending the terms to solr enquoted
> > > 2> i'd thought that at one point and reran the indexing. i _had_ had
> two
> > of
> > > the fields not indexed, but this represented one pass (same analyzer)
> > from
> > > two diff source fields while 2 or 3 of the other 4 fields _were_
> seeming
> > as
> > > if they should match. maybe just need to do this for said sanity at
> this
> > > point lol
> > > 3> i'm using dismax, no mm param set
> > >
> > > some further context:
> > >
> > > i'm querying something like this: ...fq=manufacturer:("VENDOR:VENDOR
> > US")
> > > OR manufacturer_syn:("VENDOR:VENDOR US")...
> > >
> > > The indexed value is: "Vendor"
> > >
> > > the output of field 1 in the Analysis tab would be:
> > > *index*: vendor_coolmed | coolmed | vendor
> > > *query*: vendor_vendor_coolmed | vendor | vendor
> > >
> > > the other field (and a couple other, related ones, actually) have
> similar
> > > situations where I see a clear match (as well as get the confirmation
> of
> > it
> > > when switching to the old UI and seeing the highlighting) yet get no
> > > results in my actual query.
> > >
> > > a further note. when i get the query debugging enabled I can see this
> in
> > > the output:
> > > "filter_queries":["manufacturer_syn_both:\"Vendor:Vendor US\"",
> > > "manufacturer_split_syn:(\"Vendor:Vendor US\")"],
> > > "parsed_filter_queries":["
> > > MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
> > > vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor
> vendor\")"],...
> > >
> > > It looks as if the parsed query is wrapped in quotes even after having
> > been
> > > parsed, so while the correct tokens, i.e. "vendor", are present to
> match
> > > against the indexed value, the fact that the entire parsed derivative
> of
> > > the initial query is sent to match (if that's indeed what's happening)
> > > won't actually get any hits. Yet if I remove the quotes when sending
> over
> > > to query then the parsing doesn't get to a point of having any
> > > worthwhile/matching tokens to begin with.
> > >
> > > one last thing: i've attempted with just "vendor" being sent over to
> help
> > > remove complexity and, once more, i see Analysis chain functioning just
> > > fine but the query itself getting 0 hits.
> > >
> > > think TermComponents is the best option at this point or something else
> > > given the above filler info?
> > >
> > > --
> > > *John Blythe*
> > > Product Manager & Lead Developer
> > >
> > > 251.605.3071 | john@curvolabs.com
> > > www.curvolabs.com
> > >
> > > 58 Adams Ave
> > > Evansville, IN 47713
> > >
> > > On Tue, Apr 11, 2017 at 1:20 PM, Erick Erickson <
> erickerickson@gmail.com
> > >
> > > wrote:
> > >
> > > > &debug=query is your friend. There are several issues that often trip
> > > > people up:
> > > >
> > > > 1> The analysis tab pre-supposes that what you put in the boxes gets
> > > > all the way to the field in question. Trivial example:
> > > > I put (without quotes) "erick erickson" in the "name" field in the
> > > > analysis page and see that it gets tokenized correctly. But the query
> > > > "name:erick erickson" actually gets parsed at a higher level into
> > > > name:erick default_search_field:erickson. See the discussion at:
> > > > SOLR-9185
> > > >
> > > > 2> what you think is in your indexed field isn't really. Can happen
> if
> > > > you changed your analysis chain but didn't totally re-index. Can
> > > > happen because one of the parts of the analysis chain works
> > > > differently than you expect (WordDelimiterFilterFactory, for
> instance,
> > > > has a ton of options that can alter the tokens emitted). The
> > > > TermsComponent will let you examine the terms actually _in_ the index
> > > > that you search on. You stated that the analysis page shows you what
> > > > you expect, so this is a sanity check.
> > > >
> > > > 3> You're using edismax and setting some parameter, mm=100% is a
> > > > favorite and it's having this effect.
> > > >
> > > > So add debug=query and provide a sample document (or just a field)
> and
> > > > the schema definition for the field in question if you're still
> > > > stumped.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > > On Tue, Apr 11, 2017 at 8:35 AM, John Blythe <john@curvolabs.com>
> > wrote:
> > > > > hi everyone.
> > > > >
> > > > > i recently wrote in ('analysis matching, query not') but never
> heard
> > > back
> > > > > so wanted to follow up. i'm at my wit's end currently. i have
> several
> > > > > fields that are showing matches in the analysis tab. when i dumb
> down
> > > the
> > > > > string sent over to query it still gives me issues in some field
> > cases.
> > > > >
> > > > > any thoughts on how to debug to figure out wtf is going on here
> would
> > > be
> > > > > greatly appreciated. the use case is straightforward and the
> solution
> > > > > should be as well, so i'm at a loss as to how in the world i'm
> having
> > > > > issues w this.
> > > > >
> > > > > can provide any amount of contextualizing information you need,
> just
> > > let
> > > > me
> > > > > know what could be beneficial.
> > > > >
> > > > > best,
> > > > >
> > > > > john
> > > >
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>



-- 
Sincerely yours
Mikhail Khludnev

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message