lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-6459) [suggest] Query Interface for suggest API
Date Thu, 14 May 2015 09:37:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14543434#comment-14543434
] 

Michael McCandless commented on LUCENE-6459:
--------------------------------------------


bq. We should use this API for other suggesters, but maybe in a separate issue 

Yes, definitely separate!

bq. In terms of new functionality, fuzzy, regex and context queries are added. 

OK, cool, these are nice additions.

bq. query-time context boosts

Cool: so you boost some contexts more than others, using ContextQuery.

bq. As we only allow integer index-time weights

I thought we accept long (not int) as index-time weight?  (But, I
think that really is overkill... maybe they should just floats, like
per-field boosting at index time).  But we can worry about this
later...

bq. one possibility would be to use index-weight + (Int.MAX * boost) instead of using MaxWeight
of suggestions

Sorry I don't understand the idea here?

{quote}
bq. If you try to use ContextQuery against a field that you had not indexed contexts with
(using ContextSuggestField) do you see any error? Maybe this is too hard.

There should not be any error. A ContextQuery will never be run on a SuggestField
{quote}

It seems like we could detect this mis-use, since CompletionTerms seems to know whether the
field was indexed with contexts or not?  I.e, if I accidentally try to run a ContextQuery
against a field indexed with only SuggestField, it seems like I should get an exception saying
I screwed up ... (similar to trying to run a PhraseQuery on a field that did not index positions)?
 Maybe add a simple test case?

{quote}
A ContextQuery will never be run on a SuggestField, 
CompletionQuery rewrites appropriately given the type of the field (context-enabled or not).

{quote}

OK maybe at that rewrite is the time to throw the exc?

{quote}
This also makes non-context queries work as expected when run against ContextSuggestField

(as in the query is wrapped as a ContextQuery with no context filtering/boosting).
{quote}

OK, for that direction it makes sense allow ... good.

{quote}
bq. Are you allowed to mix ContextSuggestField and SuggestField even for the same field name,
within one suggester?

No you are not. If mixed, CompletionFieldsConsumer will throw IllegalArgumentException upon
indexing.
{quote}

OK, excellent.

Can we rename {{TopSuggestDocsCollector.num()}} to maybe .getCountToCollect or something a
bit more verbose?

Net/net this is a nice change, thanks [~areek]!


> [suggest] Query Interface for suggest API
> -----------------------------------------
>
>                 Key: LUCENE-6459
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6459
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: core/search
>    Affects Versions: 5.1
>            Reporter: Areek Zillur
>            Assignee: Areek Zillur
>             Fix For: Trunk, 5.x, 5.1
>
>         Attachments: LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch, LUCENE-6459.patch,
LUCENE-6459.patch, LUCENE-6459.patch
>
>
> This patch factors out common indexing/search API used by the recently introduced [NRTSuggester|https://issues.apache.org/jira/browse/LUCENE-6339]. The
motivation is to provide a query interface for FST-based fields (*SuggestField* and *ContextSuggestField*)
for enabling suggestion scoring and more powerful automaton queries. 
> Previously, only prefix ‘queries’ with index-time weights were supported but we can
also support:
> * Prefix queries expressed as regular expressions:  get suggestions that match multiple
prefixes
>       ** Example: _star\[wa\|tr\]_ matches _starwars_ and _startrek_
> * Fuzzy Prefix queries supporting scoring: get typo tolerant suggestions scored by how
close they are to the query prefix
>     ** Example: querying for _seper_ will score _separate_ higher then _superstitious_
> * Context Queries: get suggestions boosted and/or filtered based on their indexed contexts
(meta data)
>     ** Example: get typo tolerant suggestions on song names with prefix _like a roling_
boosting songs with genre _rock_ and _indie_
>     ** Example: get suggestion on all file names starting with _finan_ only for _user1_
and _user2_
> h3. Suggest API
> {code}
> SuggestIndexSearcher searcher = new SuggestIndexSearcher(reader);
> CompletionQuery query = ...
> TopSuggestDocs suggest = searcher.suggest(query, num);
> {code}
> h3. CompletionQuery
> *CompletionQuery* is used to query *SuggestField* and *ContextSuggestField*. A *CompletionQuery*
produces a *CompletionWeight*, which allows *CompletionQuery* implementations to pass in an
automaton that will be intersected with a FST and allows boosting and meta data extraction
from the intersected partial paths. A *CompletionWeight* produces a *CompletionScorer*. A
*CompletionScorer* executes a Top N search against the FST with the provided automaton, scoring
and filtering all matched paths. 
> h4. PrefixCompletionQuery
> Return documents with values that match the prefix of an analyzed term text 
> Documents are sorted according to their suggest field weight. 
> {code}
> PrefixCompletionQuery(Analyzer analyzer, Term term)
> {code}
> h4. RegexCompletionQuery
> Return documents with values that match the prefix of a regular expression
> Documents are sorted according to their suggest field weight.
> {code}
> RegexCompletionQuery(Term term)
> {code}
> h4. FuzzyCompletionQuery
> Return documents with values that has prefixes within a specified edit distance of an
analyzed term text.
> Documents are ‘boosted’ by the number of matching prefix letters of the suggestion
with respect to the original term text.
> {code}
> FuzzyCompletionQuery(Analyzer analyzer, Term term)
> {code}
> h5. Scoring
> {{suggestion_weight + (global_maximum_weight * boost)}}
> where {{suggestion_weight}}, {{global_maximum_weight}} and {{boost}} are all integers.

> {{boost = # of prefix characters matched}}
> h4. ContextQuery
> Return documents that match a {{CompletionQuery}} filtered and/or boosted by provided
context(s). 
> {code}
> ContextQuery(CompletionQuery query)
> contextQuery.addContext(CharSequence context, int boost, boolean exact)
> {code}
> *NOTE:* {{ContextQuery}} should be used with {{ContextSuggestField}} to query suggestions
boosted and/or filtered by contexts
> h5. Scoring
> {{suggestion_weight + (global_maximum_weight * context_boost)}}
> where {{suggestion_weight}}, {{global_maximum_weight}} and {{context_boost}} are all
integers
> When used with {{FuzzyCompletionQuery}},
> {{suggestion_weight + (global_maximum_weight * (context_boost + fuzzy_boost))}}
> h3. Context Suggest Field
> To use {{ContextQuery}}, use {{ContextSuggestField}} instead of {{SuggestField}}. Any
{{CompletionQuery}} can be used with {{ContextSuggestField}}, the default behaviour is to
return suggestions from *all* contexts. {{Context}} for every completion hit can be accessed
through {{SuggestScoreDoc#context}}.
> {code}
> ContextSuggestField(String name, Collection<CharSequence> contexts, String value,
int weight) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message