lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: AutoSuggest with Query-Filters
Date Mon, 11 Mar 2013 12:36:36 GMT
On Mon, Mar 11, 2013 at 7:33 AM, Nils Knappmeier
<n.knappmeier@i-views.de> wrote:
> Hi,
>
>> This is tricky.
>>
>> You could build a separate suggester per category/zip code (or,
>> possibly prefix-code each suggestion with the category/zip code into
>> one suggester), but likely this will blow up (ie, if the same
>> suggestion often appears across zip codes / categories).  If your
>> suggestions are already highly orthogonal across category / zip code
>> then it may not blow up...
>>
>> Alternatively maybe you could store some info per-suggestion about
>> which zip code / category it appears in, using upcoming payloads
>> addition (see LUCENE-4820), and use that to filter each suggestion as
>> it arrives.
>>
>> But: have you confirmed this is really a problem in practice?  Ie,
>> typically suggestions have a strong a-priori rank based on eg how
>> often that query was asked (if suggestions come from your query logs,
>> like Google) or based on how popular that item is (if your suggestions
>> come from your content, like Netflix), in which case, if suggestions
>> are not that orthogonal, the risk of a bad suggestion may be very low?
>
> Maybe we had a misconception of the intended use case of the
> AnalyzingSuggester or the auto-suggest feature in general.
>
> Our suggestions should come solely from the index and not from a query log.
> I haven't even thought about using a query log as source. I think, in this
> case, it would be better to work on the index directly (using a
> PrefixTermEnum or so)...

It's fine for the source of the suggestions to be the index, but then
those input strings are necessarily whatever you had previously
indexed/analyzed/tokenized.

Ie, if you normalize accents and stem your tokens, then the input to
the suggester will be the normalized form not the surface form, and it
will suggest only those normalized forms.

Whereas the power of the AnalyzingSuggester is to take the surface
forms (unanalyzed) as input, yet make suggestions based on the
analyzed form.  So the user will see suggestions with accents and with
plurals.

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message