lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
Date Wed, 20 Mar 2013 00:49:16 GMT


Robert Muir commented on LUCENE-4845:

I managed to build the FreeDB suggest using this but ... it required a lot of RAM: it OOM'd
at 14 GB heap but finished successfully at 20 GB heap.

Took a longish time to build too, and made a biggish FST (more than 2X larger than the index):

I think its because your FreeDB has a lot more words than my place names?

But really there must be a infixing limit for relevance reasons alone.

We should try the N prefix limit ... but I don't really like that. Maybe we should just offer
both approaches ...

Why is it so bad, but the edge-ngrams limit ok?

> Add AnalyzingInfixSuggester
> ---------------------------
>                 Key: LUCENE-4845
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spellchecker
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, 4.3
>         Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch, LUCENE-4845.patch
> Our current suggester impls do prefix matching of the incoming text
> against all compiled suggestions, but in some cases it's useful to
> allow infix matching.  E.g, Netflix does infix suggestions in their
> search box.
> I did a straightforward impl, just using a normal Lucene index, and
> using PostingsHighlighter to highlight matching tokens in the
> suggestions.
> I think this likely only works well when your suggestions have a
> strong prior ranking (weight input to build), eg Netflix knows
> the popularity of movies.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message