lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Audrey Lorberfeld - Audrey.Lorberfeld@ibm.com" <Audrey.Lorberf...@ibm.com>
Subject Re: Re: Re: Anyone have experience with Query Auto-Suggestor?
Date Fri, 24 Jan 2020 14:41:27 GMT
Hi Alessandro,

I'm so happy there is someone who's done extensive work with QAC here! 

Right now, we measure nDCG via a Dynamic Bayesian Network. To break it down, we: 
- use a DBN model to generate a "score" for each query_url pair. 
- We then plug that score into a mathematical formula we found in a research paper (happy
to share the paper if you're interested) for assigning labels 0-4. 
- We then cross-reference the scored & labeled query_url pairs with 1k of our system's
top queries and 1k of our system's random queries. 
- We use that dataset as our ground truth. 
- We then query the system in real time each day for those 2k queries, label them, and compare
those labels with our ground truth to get our system's nDCG. 

I hope that makes sense! Lots of steps __

Due to computational overhead reasons, we are pretty committed to using an external file &
a separate Solr core for our suggestions. We are also planning to use the Suggester to add
a little human nudge towards "successful" queries. I'm not sure whether that's what the Suggester
is really meant to do, but we are not using it as a naïve prefix-matcher, but more of a query-suggestion
tool. So, if we know that the query "blue pages" is less successful than the query "bluepages"
(assuming we can identify the user's intent with this query), we will not show suggestions
that match "blue pages," instead we will show suggestions that match "bluepages." Sort of
like a query rewrite, except with fuzzy prefix matching, not the introduction of synonyms/expansions.

What we are concerned with currently is how to define a "successful" query. We have things
like abandonment rate, dwell time, etc., but if you have any advice on more ways to identify
successful queries, that'd be great. We want to stay away from defining success as "popularity,"
since that will just create a closed language system where people only query popular queries,
and those queries stay popular only because people are querying them (assuming people click
on the suggestions, of course).

Let me know your thoughts!

On 1/23/20, 10:45 AM, "Alessandro Benedetti" <a.benedetti@sease.io> wrote:

    I have been working extensively on query autocompletion, these blogs should
    be helpful to you:
    
    https://urldefense.proofpoint.com/v2/url?u=https-3A__sease.io_2015_07_solr-2Dyou-2Dcomplete-2Dme.html&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=0lExcWXK-kGTAfpnv-kU_LGminLzJjJKv6hYBFQG7iI&s=c149I_QBokd35FBMGaUxoBPMViUXAdZtVnkSKTINndE&e=

    https://urldefense.proofpoint.com/v2/url?u=https-3A__sease.io_2018_06_apache-2Dlucene-2Dblendedinfixsuggester-2Dhow-2Dit-2Dworks-2Dbugs-2Dand-2Dimprovements.html&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=0lExcWXK-kGTAfpnv-kU_LGminLzJjJKv6hYBFQG7iI&s=m8s2XvI7tR1t9bNaA4SI-w90MdbLZTYxc0mBMz8RMSw&e=

    
    You idea of using search quality evaluation to drive the autocompletion is
    interesting.
    How do you currently calculate the NDCG for a query? What's your golden
    truth?
    Using that approach you will autocomplete favouring query completion that
    your search engine is able to process better, not necessarily closer to the
    user intent, still it could work.
    
    We should differentiate here between the suggester dictionary (where the
    suggestions come from, in your case it could be your extracted data) and
    the kind of suggestion (that in your case could be the free text suggester
    lookup)
    
    Cheers
    --------------------------
    Alessandro Benedetti
    Search Consultant, R&D Software Engineer, Director
    www.sease.io
    
    
    On Mon, 20 Jan 2020 at 17:02, David Hastings <hastings.recursive@gmail.com>
    wrote:
    
    > Not a bad idea at all, however ive never used an external file before, just
    > a field in the index, so not an area im familiar with
    >
    > On Mon, Jan 20, 2020 at 11:55 AM Audrey Lorberfeld -
    > Audrey.Lorberfeld@ibm.com <Audrey.Lorberfeld@ibm.com> wrote:
    >
    > > David,
    > >
    > > Thank you, that is useful. So, would you recommend using a (clean) field
    > > over an external dictionary file? We have lots of "top queries" and
    > measure
    > > their nDCG. A thought was to programmatically generate an external file
    > > where the weight per query term (or phrase) == its nDCG. Bad idea?
    > >
    > > Best,
    > > Audrey
    > >
    > > On 1/20/20, 11:51 AM, "David Hastings" <hastings.recursive@gmail.com>
    > > wrote:
    > >
    > >     Ive used this quite a bit, my biggest piece of advice is to choose a
    > > field
    > >     that you know is clean, with well defined terms/words, you dont want
    > an
    > >     autocomplete that has a massive dictionary, also it will make the
    > >     start/reload times pretty slow
    > >
    > >     On Mon, Jan 20, 2020 at 11:47 AM Audrey Lorberfeld -
    > >     Audrey.Lorberfeld@ibm.com <Audrey.Lorberfeld@ibm.com> wrote:
    > >
    > >     > Hi All,
    > >     >
    > >     > We plan to incorporate a query autocomplete functionality into our
    > > search
    > >     > engine (like this:
    > >
    > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_solr_guide_8-5F1_suggester.html&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=_8ViuZIeSRdQjONA8yHWPZIBlhj291HU3JpNIx5a55M&m=L8V-izaMW_v4j-1zvfiXSqm6aAoaRtk-VJXA6okBs_U&s=vnE9KGyF3jky9fSi22XUJEEbKLM1CA7mWAKrl2qhKC0&e=
    > >     > ). And I was wondering if anyone has personal experience with this
    > >     > component and would like to share? Basically, we are just looking
    > > for some
    > >     > best practices from more experienced Solr admins so that we have a
    > > starting
    > >     > place to launch this in our beta.
    > >     >
    > >     > Thank you!
    > >     >
    > >     > Best,
    > >     > Audrey
    > >     >
    > >
    > >
    > >
    >
    

Mime
View raw message