lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4845) Add AnalyzingInfixSuggester
Date Tue, 19 Mar 2013 00:39:15 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13605868#comment-13605868
] 

Robert Muir commented on LUCENE-4845:
-------------------------------------

{quote}
I think so ... but then I worry about the FST blowing up. I guess if we limit how "deep" the
infixing can work that would limit the FST size ... but I'd rather not have that limit.
{quote}

But how is this any different than edge-ngrams up to a limit?

With words of <= 4 chars, this suggester avoids the typical bad complexity you would get
from an inverted index because the docids are pre-sorted in weight-order, so it can early
terminate.

But as soon as you type that 5th character: it can blow up. I'm not saying its likely, but
can happen due to particulars of the content, for example if you had place names and you typed
Shangh... and this prefix matches millions and millions of terms.

                
> Add AnalyzingInfixSuggester
> ---------------------------
>
>                 Key: LUCENE-4845
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4845
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/spellchecker
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 5.0, 4.3
>
>         Attachments: infixSuggest.png, LUCENE-4845.patch, LUCENE-4845.patch
>
>
> Our current suggester impls do prefix matching of the incoming text
> against all compiled suggestions, but in some cases it's useful to
> allow infix matching.  E.g, Netflix does infix suggestions in their
> search box.
> I did a straightforward impl, just using a normal Lucene index, and
> using PostingsHighlighter to highlight matching tokens in the
> suggestions.
> I think this likely only works well when your suggestions have a
> strong prior ranking (weight input to build), eg Netflix knows
> the popularity of movies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message