lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sravan Kumar <sra...@caavo.com>
Subject Re: Bi Gram token generation with fuzzy searches
Date Thu, 08 Feb 2018 03:54:37 GMT
@Emir :   The  'sow' parameter in edismax along with the nested query
'_query_' works. Tuning has to be done for desired relevancy.

@Walter:  It would be nice to have SOLR-629 integrated into the project. As
Emir suggested, _query_ caters to my need by by applying fuzzy parameter to
the query. Anyways, I will apply the patch and give it a try.


On Wed, Feb 7, 2018 at 8:42 PM, Walter Underwood <wunder@wunderwood.org>
wrote:

> I think you need the feature in SOLR-629 that adds fuzzy to edismax.
>
> https://issues.apache.org/jira/browse/SOLR-629
>
> The patch on that issue is for Solr 4.x, but I believe someone is working
> on a new patch.
>
> wunder
> Walter Underwood
> wunder@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Feb 7, 2018, at 2:10 AM, Emir Arnautović <
> emir.arnautovic@sematext.com> wrote:
> >
> > Hi Sravan,
> > Edismax has ’sow’ parameter that results in edismax to pass query to
> field analysis, but not sure how it will work with fuzzy search. What you
> might do is use _query synthax to separate shingle and non shingle queries,
> e.g.
> > q=_query({!edismax sow=false qf=title_bigrams}$v) OR _query({!edismax
> qf=title}$v)&$v=some movie title
> >
> > HTH,
> > Emir
> > --
> > Monitoring - Log Management - Alerting - Anomaly Detection
> > Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >
> >
> >
> >> On 7 Feb 2018, at 10:55, Sravan Kumar <sravan@caavo.com> wrote:
> >>
> >> We have the following two fields for our movie title search
> >> - title without symbols
> >> a custom analyser with WordDelimiterFilterFactory, SynonymFilterFactory
> and
> >> other filters to retain only alpha numeric characters.
> >> - title with word bi grams
> >> a custom analyser with solr.ShingleFilterFactory to generate "bi gram"
> word
> >> tokens with '_' as separator.
> >>
> >> A custom similarity class is used to make tf & idf values as 1.
> >>
> >> Edismax query parser is used to perform all searches. Phrase boosting
> (pf)
> >> is also used.
> >>
> >> There are couple of issues while searching:
> >> 1>  BiGram field doesn't generate bi grams if the white spaces in the
> query
> >> are not escaped.
> >> - For example, if the query is "pursuit of happyness", then bi grams are
> >> not generated.  This is due to the fact that the edismax query parser
> >> tokenizes based on whitespaces before passing the string to
> >> analyser(correct me if I am wrong).
> >> But in case of "pursuit\ of\ happyness", they are as the string which is
> >> passed to the analyser is with the whitespace.
> >>
> >> 2>  Fuzzy search doesn't work in  whitespace escaped queries.
> >> Ex: "pursuit~2\ of\ happiness~1"
> >>
> >> 3> Edismax's Phrase boosting doesn't work the way it should in
> >> non-whitespace escaped fuzzy queries.
> >>
> >> If the query is "pursuit~2 of happiness~1" (without escaping
> whitespaces)
> >>
> >> fuzzy queries are generated
> >> (title_name:pursuit~2), (title_name:happiness~1) in the parsed query.
> >> But,edismax pf (phrase boost) generates query like
> >> title_name:"pursuit (2 pursuit2) of happiness (1 happiness1)"
> >> This means the analyser got the original query consisting the fuzzy
> >> operator for phrase boosting.
> >>
> >>
> >> 1> How whitespaces should be handled in case of filters like
> >> solr.ShingleFilterFactory to generate bi grams?
> >> 2> If generating bi grams requires whitespaces escaped and fuzzy
> searches
> >> not, how do we accomodate both these in a single solr request and scored
> >> together.
> >>
> >>
> >>
> >> -
> >> --
> >> Regards,
> >> Sravan
> >
>
>


-- 
Regards,
Sravan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message