lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dawid Weiss (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2378) FST-based Lookup (suggestions) for prefix matches.
Date Thu, 31 Mar 2011 11:33:05 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013932#comment-13013932
] 

Dawid Weiss commented on SOLR-2378:
-----------------------------------

I didn't have time to take care of this until now, apologies. So, looking at Lookup#lookup(),
I just wanted to clarify:

{code}
  /**
   * Look up a key and return possible completion for this key.
   * @param key lookup key. Depending on the implementation this may be
   * a prefix, misspelling, or even infix.
   * @param onlyMorePopular return only more popular results
   * @param num maximum number of results to return
   * @return a list of possible completions, with their relative weight (e.g. popularity)
   */
  public abstract List<LookupResult> lookup(String key, boolean onlyMorePopular, int
num);
{code}

the "onlyMorePopular" means more popular than... what? I see TSTLookup and JaspellLookup (Andrzej,
will you confirm, please?) sorts matches in a priority queue by their associated value (frequency
I guess). This makes sense, but onlyMorePopular is misleading -- it should be called onlyMostPopular
(those with the native knowledge of English subtlieties, speak up if I'm right here).

I also see and wanted to confirm -- the Dictionary can come from various sources, so we can't
rely on the presence of the built-in Lucene automaton, can we? Even if I wanted to reuse it,
there'd be no easy way to determine if it's a full automaton, or a partial one (because of
the gaps/trimming)... I think I'll just implement the solution by building the automaton from
whatever Dictionary comes in and serializing/ deserializing it similar to TSTLookup.

Sounds ok?





> FST-based Lookup (suggestions) for prefix matches.
> --------------------------------------------------
>
>                 Key: SOLR-2378
>                 URL: https://issues.apache.org/jira/browse/SOLR-2378
>             Project: Solr
>          Issue Type: New Feature
>          Components: spellchecker
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>              Labels: lookup, prefix
>             Fix For: 4.0
>
>
> Implement a subclass of Lookup based on finite state automata/ transducers (Lucene FST
package). This issue is for implementing a relatively basic prefix matcher, we will handle
infixes and other types of input matches gradually.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message