lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
Date Sat, 21 May 2011 13:19:47 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037386#comment-13037386
] 

Robert Muir commented on LUCENE-3130:
-------------------------------------

Hi Hoss Man,

I don't think I agree that a boost attribute is the best way to implement this.

A QP can already solve this issue today, simply by boosting down terms with positionIncrement
= 0. This would solve all of the cases you listed, without making these tokenstreams more
complicated.

If such a QP really needs to know more than positionIncrement=0, then a better approach would
be to set token types (need not be TypeAttribute, could be something more strongly-typed),
to indicate synonym, phonetic variation, etc etc.

But I really think the implementation details of QP should remain in QP, the analysis chain
should instead be general and describe up the text.

Otherwise, things get really confusing, e.g. what should a ShingleFilter do when it combines
two tokens that have different BoostAttributes? But with types, this is no problem at all,
because the ShingleFilter can simply set the type to 'shingle' and its unambiguous... its
up to the consumer to do whatever it wants with this.



> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower
boosts
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3130
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that matches
on the original term specified by the user would score higher then matches on the synonym.
 It occurred to me later that a float Attribute could be set by the SynonymFilter in such
situations, and QueryParser could use that float as a boost in the resulting Query.  IThis
would be fairly straightforward for the simple "synonyms => BooleamQuery" case, but we'd
have to decide how to handle the case of synonyms with multiple terms that produce MTPQ, possibly
just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at query time
where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for back compact
could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied the boost
attribute into the payload attribute, these same filters could give "penalizing" payloads
to terms when used at index time) could give "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message