lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joe Lawson <jlaw...@opensourceconnections.com>
Subject Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser
Date Mon, 06 Jun 2016 20:57:01 GMT
Mary Jo.

It appears to be working correctly but you have a very complex query going
on so it can be confusing. Assuming you are using the queryParser as
provided in examples your query would look like "+sbc" when it enters the
queryParser and would look like "+((sbc)^2.0 (sb)^0.5 (small block)^0.5)"
when it came out and then it would enter the normal pipeline and everything
would be processed as individual tokens.

It appears that you have synonyms being processed at query time on the
prodnumbertext field. For example when (sbc)^2.0 enters into the normal
query stage then have all the qf, pf, ps and tie modifies added so the
first one turns into something like

"(body:sbc^0.5 | productinfo:sbc^1.0 | keywords:sbc^2.0 | prodname:sbc^10.0
| prodnumbertext:sbc^20.0)^2.0"

Then the query time synonym expansion on produnumbertext combined with a
phrase and default mm being 100% (
https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser#TheDisMaxQueryParser-Themm(MinimumShouldMatch)Parameter)
you end up with query being

(((prodnumbertext:sbc prodnumbertext:sb prodnumbertext:small)
prodnumbertext:block)~2)^20.0

The ~2 comes from mm=100% and having the phrase "small block" as a synonym.
This messes up your results as well as anything in prodnumbertext will have
to match "sbc block" "sb block" or "small block" which of course is only
going to match small block. Check out the section "Multi-work synonyms
won't work as phrase queries" in
https://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ for
more info.

Advice: make sure on the schema that none of the fields your are running
queries against do any complex query operations, especially make sure they
aren't doing additional synonym resolution against the same file.

I think you are getting hit by the MM bug.  Try tuning it way down to
something like 0.01% and see how the matches go.



On Fri, Jun 3, 2016 at 2:21 PM, MaryJo Sminkey <mjsminkey@gmail.com> wrote:

> Okay so big thanks for the help with getting the hon_lucene_synonyms plugin
> working. That is a big load off to finally have a solution in place for all
> our multi-term synonyms. We did find that the information in Step 8 about
> the plugin showing "SynonymExpandingExtendedDismaxQParser" for QParser does
> not seem to be correct, we only ever get "ExtendedDismaxQParser" but the
> synonym expansion is definitely working.
>
> In implementing it though, the one thing I'm still having an issue with is
> trying to figure out how I can get results on the original term to appear
> first in our results and matches on the synonyms lower in the results. The
> plugin includes settings for an originalboost and synonymboost, but that
> doesn't seem to be working along with all the other edismax boosts I'm
> doing. We search across a number of fields, each with their own boost and
> then do phrase searches with boosts as well. My params look like this:
>
> params["defType"] = 'synonym_edismax';
> params["qf"] = 'body^0.5 productinfo^1.0 keywords^2.0 prodname^10.0
> prodnumbertext^20.0';
> params["pf"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["pf2"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["pf3"] = 'productinfo^1 body^5 keywords^10 prodname^50';
> params["ps"] = 1;
> params["tie"] = 0.1;
> params["synonyms"] = true;
> params["synonyms.originalBoost"] = 2.0;
> params["synonyms.synonymBoost"] = 0.5;
>
> And here's an example of what the plugin gives me for a search on "sbc"
> which includes synonyms for "sb" and "small block".... I don't really know
> enough about this to figure out what exactly it's doing but since all of
> the results I am getting first are ones with "small block" in the name, and
> the ones with "sbc" in the prodname field which should be first are buried
> about 1000 documents in, I know the originalboost and synonymboost aren't
> working with all this other stuff. Ideas how to fix this? With the normal
> synonym filter we just set up copies of the fields that could have synonyms
> to use with that filter applied and had a lower boost on those. Not sure
> how to make it work with this custom query parser though.
>
> +((prodname:sbc^10.0 | body:sbc^0.5 | productinfo:sbc | keywords:sbc^2.0 |
> (((prodnumbertext:sbc prodnumbertext:small prodnumbertext:sb)
> prodnumbertext:block)~2)^20.0)~0.1^2.0 (((+(prodname:sb^10.0 | body:sb^0.5
> | productinfo:sb | keywords:sb^2.0 | (((prodnumbertext:sb
> prodnumbertext:small prodnumbertext:sbc) prodnumbertext:block)~2)^20.0)~0.1
> ()))^0.5) (((+(((prodname:small^10.0 | body:small^0.5 | productinfo:small |
> keywords:small^2.0 | prodnumbertext:small^20.0)~0.1 (prodname:block^10.0 |
> body:block^0.5 | productinfo:block | keywords:block^2.0 |
> prodnumbertext:block^20.0)~0.1)~2) (productinfo:"small block"~1 |
> body:"small block"~1^5.0 | keywords:"small block"~1^10.0 | prodname:"small
> block"~1^50.0)~0.1 (productinfo:"small block"~1 | body:"small block"~1^5.0
> | keywords:"small block"~1^10.0 | prodname:"small
> block"~1^50.0)~0.1))^0.5)) ()
>
>
> Mary Jo
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message