lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashwin Ramesh <ash...@canva.com>
Subject Re: Dealing with multi-word keywords and SOW=true
Date Mon, 30 Sep 2019 23:24:09 GMT
Thanks Erick, that seems to work!

Should I leave it in qf also? For example the query "blue dog" may be
represented as separate tokens in the keyword index.



On Mon, Sep 30, 2019 at 9:32 PM Erick Erickson <erickerickson@gmail.com>
wrote:

> Have you tried taking your keyword field out of the “qf” param and adding
> it explicitly? As keyword:”ice cream”
>
> Best,
> Erick
>
> > On Sep 30, 2019, at 5:27 AM, Ashwin Ramesh <ashwin@canva.com> wrote:
> >
> > Hi everybody,
> >
> > I am using the edismax parser and have noticed a very specific behaviour
> > with how sow=true (default) handles multiword keywords.
> >
> > We have a field called 'keywords', which uses the general
> > KeywordTokenizerFactory. There are also other text fields like title and
> > description. etc.
> >
> > When we index a document with a keyword "ice cream", for example, we know
> > it gets indexed into that field as "ice cream".
> >
> > However, at query time, I noticed that if we run an Edismax query:
> > q=ice cream
> > qf=keywords
> >
> > I do not get that document back as a match. This is due to sow=true
> > splitting the user's query and the final tokens not being present in the
> > keywords field.
> >
> > I was wondering what the best practise around this was? Some thoughts I
> > have had:
> >
> > 1. Index multi-word keywords with hyphens or somelike similar. E.g. "ice
> > cream" -> "ice-cream"
> > 2. Additionally index the separate words as keywords also. E.g. "ice
> cream"
> > -> "ice cream", "ice", "cream". However this method will result in the
> loss
> > of intent (q=ice would return this document).
> > 3. Add a boost query which is an edismax query where we explicitly set
> > sow=false and add a huge boost. E.g*. bq={!edismax qf=keywords^1000
> > sow=false bq="" boost="" pf="" tie=1.00 v="ice cream"}*
> >
> > Is there an industry practise solution to handle this type of problem?
> Keep
> > in mind that the other text fields may also include these terms. E.g.
> > title="This is ice cream", which would match the query. This specific
> > problem affects the keywords field for the obvious reason that the
> indexing
> > pipeline does not tokenize keywords.
> >
> > Thank you for all your amazing help,
> >
> > Regards,
> >
> > Ash
> >
> > --
> > *P.S. We've launched a new blog to share the latest ideas and case
> studies
> > from our team. Check it out here: product.canva.com
> > <https://product.canva.com/>. ***
> > ** <https://www.canva.com/>Empowering the
> > world to design
> > Also, we're hiring. Apply here!
> > <https://about.canva.com/careers/>
> > <https://twitter.com/canva>
> > <https://facebook.com/canva> <https://au.linkedin.com/company/canva>
> > <https://twitter.com/canva>  <https://facebook.com/canva>
> > <https://au.linkedin.com/company/canva>  <https://instagram.com/canva>
> >
> >
> >
> >
> >
> >
>
>

-- 
*P.S. We've launched a new blog to share the latest ideas and case studies 
from our team. Check it out here: product.canva.com 
<https://product.canva.com/>. ***
** <https://www.canva.com/>Empowering the 
world to design
Also, we're hiring. Apply here! 
<https://about.canva.com/careers/>
 <https://twitter.com/canva> 
<https://facebook.com/canva> <https://au.linkedin.com/company/canva> 
<https://twitter.com/canva>  <https://facebook.com/canva>  
<https://au.linkedin.com/company/canva>  <https://instagram.com/canva>







Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message