lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] [Commented] (LUCENE-6459) [suggest] Query Interface for suggest API
Date Thu, 14 May 2015 09:37:00 GMT


Michael McCandless commented on LUCENE-6459:

bq. We should use this API for other suggesters, but maybe in a separate issue 

Yes, definitely separate!

bq. In terms of new functionality, fuzzy, regex and context queries are added. 

OK, cool, these are nice additions.

bq. query-time context boosts

Cool: so you boost some contexts more than others, using ContextQuery.

bq. As we only allow integer index-time weights

I thought we accept long (not int) as index-time weight?  (But, I
think that really is overkill... maybe they should just floats, like
per-field boosting at index time).  But we can worry about this

bq. one possibility would be to use index-weight + (Int.MAX * boost) instead of using MaxWeight
of suggestions

Sorry I don't understand the idea here?

bq. If you try to use ContextQuery against a field that you had not indexed contexts with
(using ContextSuggestField) do you see any error? Maybe this is too hard.

There should not be any error. A ContextQuery will never be run on a SuggestField

It seems like we could detect this mis-use, since CompletionTerms seems to know whether the
field was indexed with contexts or not?  I.e, if I accidentally try to run a ContextQuery
against a field indexed with only SuggestField, it seems like I should get an exception saying
I screwed up ... (similar to trying to run a PhraseQuery on a field that did not index positions)?
 Maybe add a simple test case?

A ContextQuery will never be run on a SuggestField, 
CompletionQuery rewrites appropriately given the type of the field (context-enabled or not).


OK maybe at that rewrite is the time to throw the exc?

This also makes non-context queries work as expected when run against ContextSuggestField

(as in the query is wrapped as a ContextQuery with no context filtering/boosting).

OK, for that direction it makes sense allow ... good.

bq. Are you allowed to mix ContextSuggestField and SuggestField even for the same field name,
within one suggester?

No you are not. If mixed, CompletionFieldsConsumer will throw IllegalArgumentException upon

OK, excellent.

Can we rename {{TopSuggestDocsCollector.num()}} to maybe .getCountToCollect or something a
bit more verbose?

Net/net this is a nice change, thanks [~areek]!

> [suggest] Query Interface for suggest API
> -----------------------------------------
>                 Key: LUCENE-6459
>                 URL:
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 5.1
>            Reporter: Areek Zillur
>            Assignee: Areek Zillur
>             Fix For: Trunk, 5.x, 5.1
>         Attachments: LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch,
LUCENE-6459.patch, LUCENE-6459.patch
> This patch factors out common indexing/search API used by the recently introduced [NRTSuggester|]. The
motivation is to provide a query interface for FST-based fields (*SuggestField* and *ContextSuggestField*)
for enabling suggestion scoring and more powerful automaton queries. 
> Previously, only prefix ‘queries’ with index-time weights were supported but we can
also support:
> * Prefix queries expressed as regular expressions:  get suggestions that match multiple
>       ** Example: _star\[wa\|tr\]_ matches _starwars_ and _startrek_
> * Fuzzy Prefix queries supporting scoring: get typo tolerant suggestions scored by how
close they are to the query prefix
>     ** Example: querying for _seper_ will score _separate_ higher then _superstitious_
> * Context Queries: get suggestions boosted and/or filtered based on their indexed contexts
(meta data)
>     ** Example: get typo tolerant suggestions on song names with prefix _like a roling_
boosting songs with genre _rock_ and _indie_
>     ** Example: get suggestion on all file names starting with _finan_ only for _user1_
and _user2_
> h3. Suggest API
> {code}
> SuggestIndexSearcher searcher = new SuggestIndexSearcher(reader);
> CompletionQuery query = ...
> TopSuggestDocs suggest = searcher.suggest(query, num);
> {code}
> h3. CompletionQuery
> *CompletionQuery* is used to query *SuggestField* and *ContextSuggestField*. A *CompletionQuery*
produces a *CompletionWeight*, which allows *CompletionQuery* implementations to pass in an
automaton that will be intersected with a FST and allows boosting and meta data extraction
from the intersected partial paths. A *CompletionWeight* produces a *CompletionScorer*. A
*CompletionScorer* executes a Top N search against the FST with the provided automaton, scoring
and filtering all matched paths. 
> h4. PrefixCompletionQuery
> Return documents with values that match the prefix of an analyzed term text 
> Documents are sorted according to their suggest field weight. 
> {code}
> PrefixCompletionQuery(Analyzer analyzer, Term term)
> {code}
> h4. RegexCompletionQuery
> Return documents with values that match the prefix of a regular expression
> Documents are sorted according to their suggest field weight.
> {code}
> RegexCompletionQuery(Term term)
> {code}
> h4. FuzzyCompletionQuery
> Return documents with values that has prefixes within a specified edit distance of an
analyzed term text.
> Documents are ‘boosted’ by the number of matching prefix letters of the suggestion
with respect to the original term text.
> {code}
> FuzzyCompletionQuery(Analyzer analyzer, Term term)
> {code}
> h5. Scoring
> {{suggestion_weight + (global_maximum_weight * boost)}}
> where {{suggestion_weight}}, {{global_maximum_weight}} and {{boost}} are all integers.

> {{boost = # of prefix characters matched}}
> h4. ContextQuery
> Return documents that match a {{CompletionQuery}} filtered and/or boosted by provided
> {code}
> ContextQuery(CompletionQuery query)
> contextQuery.addContext(CharSequence context, int boost, boolean exact)
> {code}
> *NOTE:* {{ContextQuery}} should be used with {{ContextSuggestField}} to query suggestions
boosted and/or filtered by contexts
> h5. Scoring
> {{suggestion_weight + (global_maximum_weight * context_boost)}}
> where {{suggestion_weight}}, {{global_maximum_weight}} and {{context_boost}} are all
> When used with {{FuzzyCompletionQuery}},
> {{suggestion_weight + (global_maximum_weight * (context_boost + fuzzy_boost))}}
> h3. Context Suggest Field
> To use {{ContextQuery}}, use {{ContextSuggestField}} instead of {{SuggestField}}. Any
{{CompletionQuery}} can be used with {{ContextSuggestField}}, the default behaviour is to
return suggestions from *all* contexts. {{Context}} for every completion hit can be accessed
through {{SuggestScoreDoc#context}}.
> {code}
> ContextSuggestField(String name, Collection<CharSequence> contexts, String value,
int weight) 
> {code}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message