lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Blythe <j...@curvolabs.com>
Subject Re: simple matches not catching at query time
Date Tue, 11 Apr 2017 20:32:57 GMT
first off, i don't think i have a full handle on the import of what is
outputted by the debugger.

that said, if "...PhraseQuery(manufacturer_split_syn:\"vendor vendor\")" is
matching against `vendor_coolmed | coolmed | vendor`, then 'vendor' should
match. the query analyzer is keywordtokenizer, pattern replacement
(replaces all non-alphanumeric with underscores), checks for synonyms (the
underscores are my way around the multi term synonym problem), then
worddelimiter is used to blow out the underscores and generate word parts
("vendor_vendor" => 'vendor' 'vendor'), stop filter, lower case, stem.

in your mentioned strategy, what is the "id:<expected>" representative of?

thanks!

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | john@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Tue, Apr 11, 2017 at 4:12 PM, Mikhail Khludnev <mkhl@apache.org> wrote:

> John,
>
> How do you suppose to match any of "parsed_filter_queries":["
> MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
> vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"
> against
> vendor_coolmed | coolmed | vendor ?
>
> I just can't see any chance to match them.
>
> One possible strategy is pick the simplest filter query, put it as a main
> query.
> Then pass &expainOther=id:<expected> and share the explanation.
>
>
>
> On Tue, Apr 11, 2017 at 8:57 PM, John Blythe <john@curvolabs.com> wrote:
>
> > hi, erick.
> >
> > appreciate the feedback.
> >
> > 1> i'm sending the terms to solr enquoted
> > 2> i'd thought that at one point and reran the indexing. i _had_ had two
> of
> > the fields not indexed, but this represented one pass (same analyzer)
> from
> > two diff source fields while 2 or 3 of the other 4 fields _were_ seeming
> as
> > if they should match. maybe just need to do this for said sanity at this
> > point lol
> > 3> i'm using dismax, no mm param set
> >
> > some further context:
> >
> > i'm querying something like this: ...fq=manufacturer:("VENDOR:VENDOR
> US")
> > OR manufacturer_syn:("VENDOR:VENDOR US")...
> >
> > The indexed value is: "Vendor"
> >
> > the output of field 1 in the Analysis tab would be:
> > *index*: vendor_coolmed | coolmed | vendor
> > *query*: vendor_vendor_coolmed | vendor | vendor
> >
> > the other field (and a couple other, related ones, actually) have similar
> > situations where I see a clear match (as well as get the confirmation of
> it
> > when switching to the old UI and seeing the highlighting) yet get no
> > results in my actual query.
> >
> > a further note. when i get the query debugging enabled I can see this in
> > the output:
> > "filter_queries":["manufacturer_syn_both:\"Vendor:Vendor US\"",
> > "manufacturer_split_syn:(\"Vendor:Vendor US\")"],
> > "parsed_filter_queries":["
> > MultiPhraseQuery(manufacturer_syn_both:\"(vendor_vendor_us vendor)
> > vendor\")", "PhraseQuery(manufacturer_split_syn:\"vendor vendor\")"],...
> >
> > It looks as if the parsed query is wrapped in quotes even after having
> been
> > parsed, so while the correct tokens, i.e. "vendor", are present to match
> > against the indexed value, the fact that the entire parsed derivative of
> > the initial query is sent to match (if that's indeed what's happening)
> > won't actually get any hits. Yet if I remove the quotes when sending over
> > to query then the parsing doesn't get to a point of having any
> > worthwhile/matching tokens to begin with.
> >
> > one last thing: i've attempted with just "vendor" being sent over to help
> > remove complexity and, once more, i see Analysis chain functioning just
> > fine but the query itself getting 0 hits.
> >
> > think TermComponents is the best option at this point or something else
> > given the above filler info?
> >
> > --
> > *John Blythe*
> > Product Manager & Lead Developer
> >
> > 251.605.3071 | john@curvolabs.com
> > www.curvolabs.com
> >
> > 58 Adams Ave
> > Evansville, IN 47713
> >
> > On Tue, Apr 11, 2017 at 1:20 PM, Erick Erickson <erickerickson@gmail.com
> >
> > wrote:
> >
> > > &debug=query is your friend. There are several issues that often trip
> > > people up:
> > >
> > > 1> The analysis tab pre-supposes that what you put in the boxes gets
> > > all the way to the field in question. Trivial example:
> > > I put (without quotes) "erick erickson" in the "name" field in the
> > > analysis page and see that it gets tokenized correctly. But the query
> > > "name:erick erickson" actually gets parsed at a higher level into
> > > name:erick default_search_field:erickson. See the discussion at:
> > > SOLR-9185
> > >
> > > 2> what you think is in your indexed field isn't really. Can happen if
> > > you changed your analysis chain but didn't totally re-index. Can
> > > happen because one of the parts of the analysis chain works
> > > differently than you expect (WordDelimiterFilterFactory, for instance,
> > > has a ton of options that can alter the tokens emitted). The
> > > TermsComponent will let you examine the terms actually _in_ the index
> > > that you search on. You stated that the analysis page shows you what
> > > you expect, so this is a sanity check.
> > >
> > > 3> You're using edismax and setting some parameter, mm=100% is a
> > > favorite and it's having this effect.
> > >
> > > So add debug=query and provide a sample document (or just a field) and
> > > the schema definition for the field in question if you're still
> > > stumped.
> > >
> > > Best,
> > > Erick
> > >
> > > On Tue, Apr 11, 2017 at 8:35 AM, John Blythe <john@curvolabs.com>
> wrote:
> > > > hi everyone.
> > > >
> > > > i recently wrote in ('analysis matching, query not') but never heard
> > back
> > > > so wanted to follow up. i'm at my wit's end currently. i have several
> > > > fields that are showing matches in the analysis tab. when i dumb down
> > the
> > > > string sent over to query it still gives me issues in some field
> cases.
> > > >
> > > > any thoughts on how to debug to figure out wtf is going on here would
> > be
> > > > greatly appreciated. the use case is straightforward and the solution
> > > > should be as well, so i'm at a loss as to how in the world i'm having
> > > > issues w this.
> > > >
> > > > can provide any amount of contextualizing information you need, just
> > let
> > > me
> > > > know what could be beneficial.
> > > >
> > > > best,
> > > >
> > > > john
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message