lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: AutoSuggest with Query-Filters
Date Mon, 11 Mar 2013 11:05:26 GMT
On Mon, Mar 11, 2013 at 6:31 AM, Nils Knappmeier
<> wrote:
> Dear all,
> I have a request to implement an auto-suggest feature for our lucene based
> product.
> We have upgraded to Lucene 4.1 and intend to use the AnalyzingSuggester, but
> we cannot determine the correct way of using it for our request.
> We have problems with two aspects:
> 1) The suggester should suggest original (stored) field values. The API is
> be built such that a LuceneDictionary is used to provide terms to the
> suggester. A Dictionary provides a BytesRefIterator, which is (i.e. in
> LuceneDictionary) implemented to return the tokenized and analyzed terms
> with reduced umlauts and plural forms).
> How is the intended use here?

You shouldn't use LuceneDictionary, since it just enumerates the
tokens from the index.

Instead, make your own TermFreqIterator that provides the original
suggestion, and pass an Analyzer to AnalyzingSuggester to normalize
the surface forms.

> 2) We do want to suggest terms that have an empty search result. There are a

I think you meant "do not"?

> number of filters that can be set (zip-code, categories). Our problem is
> that there is no way to tell the suggester about these filters. Do we have
> to iterate all suggested terms and check for each one, if it provides
> results with the given filter settings?

This is tricky.

You could build a separate suggester per category/zip code (or,
possibly prefix-code each suggestion with the category/zip code into
one suggester), but likely this will blow up (ie, if the same
suggestion often appears across zip codes / categories).  If your
suggestions are already highly orthogonal across category / zip code
then it may not blow up...

Alternatively maybe you could store some info per-suggestion about
which zip code / category it appears in, using upcoming payloads
addition (see LUCENE-4820), and use that to filter each suggestion as
it arrives.

But: have you confirmed this is really a problem in practice?  Ie,
typically suggestions have a strong a-priori rank based on eg how
often that query was asked (if suggestions come from your query logs,
like Google) or based on how popular that item is (if your suggestions
come from your content, like Netflix), in which case, if suggestions
are not that orthogonal, the risk of a bad suggestion may be very low?

Mike McCandless

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message