lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
Date Sun, 26 Jun 2011 11:04:47 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055050#comment-13055050
] 

Robert Muir commented on LUCENE-3130:
-------------------------------------

{quote}
Currently I use a separate field for phonetic normalization and include it with a lower weight
in DisMax. If phonetic variant instead was stored alongside the original with posIncr=0 and
tokenType=phonetic, I could instead specify a deboost factor for phonetic terms and even highlighting
would work ootb!
{quote}

This doesn't make any sense to me: how is this "better" shoved into one field than two fields?
I don't see any advantage at all. field A with original terms and field B with phonetic terms
is no less efficient in the index than having field AB with both mixed up, but keeping them
separate keeps code and configurations simple.

As for the highlighting, that sounds like a highlighting problem, not an analysis problem.
If its often the case that users use things like copyField and do this boosting, then highlighting
in Solr needs to be fixed to correlate the offsets back to the original stored field: but
we need not make analysis more complicated because of this limitation.


{quote}
If the LowerCaseFilter would keep the original token and add a lowercased token on same posIncr
with tokenType=lowercase, we could support case insensitive match with preference for correct
case.
{quote}

I don't think we should complicate our tokenfilters with such things: in this case I think
it would just make the code more complicated and make relevance worse: often case is totally
meaningless and boosting terms for some arbitrary reason will skew scores.

This is for the same reason as above. If you want to do this, I think you should use two fields,
one with no case, and one with case, and boost one of them. 


> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower
boosts
> -----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3130
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3130
>             Project: Lucene - Java
>          Issue Type: Improvement
>            Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that matches
on the original term specified by the user would score higher then matches on the synonym.
 It occurred to me later that a float Attribute could be set by the SynonymFilter in such
situations, and QueryParser could use that float as a boost in the resulting Query.  IThis
would be fairly straightforward for the simple "synonyms => BooleamQuery" case, but we'd
have to decide how to handle the case of synonyms with multiple terms that produce MTPQ, possibly
just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at query time
where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for back compact
could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied the boost
attribute into the payload attribute, these same filters could give "penalizing" payloads
to terms when used at index time) could give "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message